Automatic classification of catalog is used to automatically classify product sheets, depending of theirs contents. Having a good hierarchical classification of products is critical pour offer a good user experience on the e-shop. Since a few years, there is a renewed interest in automatic classification, due to increase of e-commerce, of catalog size, emergence of market place business model and the need to go online fast for new products.
Automatic classification is an old issue, dating from sixties, with huge progress thanks to more and more powerful algorithms. Before, automatic classification was a complex task : it was mainly based on rules created manually by experts, needing to be reviewed when product sheets format was changing or when new categories were added. This leading to an accuracy rate to low for a full automatic process (human validation was needed).
With the new deep learning perspective, automatic tasks allows to save a lot of time and classify bigger catalogs. Indeed, deep learning is an artificial intelligence process that create itself rules from a learning dataset (a dataset already classified). It consists of training this learning dataset so to find best parameters to configure the classification system. This system avoids a manual configuration of rules and allows high accuracy rate, thus a fully automatic process without human validation.
Now automatic classification is a reliable tool and many deep learning automatic classification are already used for other needs (emails classification or anti-spams examples).
A deep learning automatic classifier API
Historically, automatic classification processes were based on many generations of algorithms, each new one adding accuray to the previous one : Bayesian networks, decisions trees and KNN example.
ContentSide participated in european project PAPUD (Profiling and Analysis Platform Using Deep Learning) from 2017 to 2019. Our task was to create a new automatic classification service based on deep learning. Our service was trained on catalogs from big retailers with more than 2 000 categories to predict for each catalog.
Proven results showed an accuracy rate (F-score) of 97%.
During R&D of our solution, we used more than 1 million products from a multi-level and multiple classification, covering a high variety of domains (sport, tech, clothes, books, grocery products, …).
Heterogeneity and complexity of theses datasets require a classification system that use all available data of the catalog structure (product sheet structure, semantic, characteristics, …). Our approach is represented in the following schema.
First, a preprocessing of data is done : metadatas extraction, remove of unwanted words and patterns, selection of relevant patterns.
Then we use a set of statistical measures from patterns extracted before ; than vectorization is done on those measures to prepare for learning.
Vectors generated before contain scores for each measures and are data entries for the neural network.
A new generation of automatic classification system is getting a vert high precision rate, allowing to fully aotomatize classification of products on an e-shop, without human validation.
Thus, online retailers can reduce their costs and enhence user experience on the website, showing him a proper classification of products.
You can contact ContentSide solution team to discover our solution :