1

I want to segment a dataset containing items (labeled with IDs), and multiple categorical features that take different values (for instance, color takes 'blue', 'orange', 'green'; size takes 'S', 'M', 'L', brand takes 'Brand A', 'Brand B', etc.):

ID Brand Color Size Price
1 Brand 1 Orange S 23
2 Brand 2 Blue XXL 3
3 Brand 1 Green XXXL 45
4 Brand 2 Blue M 200

I can easily do it by hand for 1 or 2 features (with a small number of values). E.G. if I segment by brand I get:

ID Brand Color Size Price
1 Brand 1 Orange S 23
3 Brand 1 Green XXXL 45

and

ID Brand Color Size Price
2 Brand 2 Blue XXL 3
4 Brand 2 Blue M 200

Unfortunately, some features take 10+ values. Moreover, the number of subsets explodes if I want to segment according to more than 1 feature for segmentation. I am trying to test different levels of segmentation (e.g. color + brand, color+brand+size) which is why I don't do it by hand.

I am trying to figure out a function that take the dataframe and a list of features in input and that output all the different subsets but for now, my code is worthless.

Thank you in advance if you think you can help me!

  • Not enough reputation to comment, but [see this thread](https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key) – Cory Nezin Feb 15 '22 at 15:18
  • Thanks @CoryNezin for this research line :) –  Feb 16 '22 at 08:50

0 Answers0