0

I have what I think a peculiar problem, I am trying to get attributes of products that may overlap.

In my case, given the title, manufacturer, description, I need to know whether the product is a Jeans or something else and further more, whether it’s a or Skinny Jeans or other types of Jeans. Going through the sci-kit exercises it seems I can only predict one category at a time, which doesn’t apply to my case, any suggestion on how to tackle the problem?

What I have in mind right now is to have a training data for each category ex: Jeans = ['desc of jeans 1', 'desc of jeans 2'] Skinny Jeans ['desc of skinny jeans 1', 'desc of skinny jeans 2'] with this training data, I would then ask the probability of a given unknown product and expect this kind of answer in return in percentage of matching: Unknown_Product_1 = { 'jeans': 93, 'skinny_jeans': 80, 't-shirt': 5 } Am I way off base? If this is a correct path to take, if so, how do I achieve it?

Thank you!

Wahyu
  • 105
  • 3
  • 14
  • This is hierarchical classification. There's no built-in support for that in scikit-learn. You can reduce this to multiple classification problems, or to a single multi-label problem. – Fred Foo Oct 08 '14 at 09:55

1 Answers1

1

You are probably describing a task called multi-label learning or multi-label classification.

A key difference between this task and the standard classification task is that by learning a relationship between the labels, you can sometimes obtain better performance than if you train many independent standard classifiers.

user1149913
  • 4,463
  • 1
  • 23
  • 28
  • +1. definitely multi-label classification. Probably can use a taxonomy as knowledge of the hierarchy. Pants/Jeans//Skinny Jeans – greeness Oct 07 '14 at 04:27
  • you are right, it is called multi-label and this stackoverflow really helped: http://stackoverflow.com/questions/10526579/use-scikit-learn-to-classify-into-multiple-categories – Wahyu Oct 07 '14 at 19:38