extracting overlapping categories through machine learning

Question

I have what I think a peculiar problem, I am trying to get attributes of products that may overlap.

In my case, given the title, manufacturer, description, I need to know whether the product is a Jeans or something else and further more, whether it’s a or Skinny Jeans or other types of Jeans. Going through the sci-kit exercises it seems I can only predict one category at a time, which doesn’t apply to my case, any suggestion on how to tackle the problem?

What I have in mind right now is to have a training data for each category ex: Jeans = ['desc of jeans 1', 'desc of jeans 2'] Skinny Jeans ['desc of skinny jeans 1', 'desc of skinny jeans 2'] with this training data, I would then ask the probability of a given unknown product and expect this kind of answer in return in percentage of matching: Unknown_Product_1 = { 'jeans': 93, 'skinny_jeans': 80, 't-shirt': 5 } Am I way off base? If this is a correct path to take, if so, how do I achieve it?

Thank you!

This is hierarchical classification. There's no built-in support for that in scikit-learn. You can reduce this to multiple classification problems, or to a single multi-label problem. — Fred Foo, Oct 08 '14 at 09:55

score 1 · Accepted Answer · answered Oct 07 '14 at 03:03

1

You are probably describing a task called multi-label learning or multi-label classification.

A key difference between this task and the standard classification task is that by learning a relationship between the labels, you can sometimes obtain better performance than if you train many independent standard classifiers.

answered Oct 07 '14 at 03:03

user1149913

4,463
1
23
28

+1. definitely multi-label classification. Probably can use a taxonomy as knowledge of the hierarchy. Pants/Jeans//Skinny Jeans – greeness Oct 07 '14 at 04:27
you are right, it is called multi-label and this stackoverflow really helped: http://stackoverflow.com/questions/10526579/use-scikit-learn-to-classify-into-multiple-categories – Wahyu Oct 07 '14 at 19:38

extracting overlapping categories through machine learning

1 Answers1