1

So i have a data like this

Category subcategory crn product1 product2 product3 product4 product5 product6 product7
A X 1 1 1 1 1 1 1 1
A Y 1 1 1 1 1 1 1 1
A Z 1 1 1 1 1 1 1 1
B X 1 1 1 1 1 1 1 1

And i want to showcase output like this

Category subcategory crncount topproduct1 topproduct2 topproduct3 topproduct4 topproduct5 sumofproduct1 sumof product2 sumofprduct3 sumofproduct4 sumofprodct5 sumofproduct6 sumofprodct7
A x 1 product3 product2 product4 product5 product6 1 1 1 1 1 1 1
A y 1 product3 product4 product5 product6 product7 1 1 1 1 1 1 1
B x 1 product1 product3 product4 product5 product6 1 1 1 1 1 1 1
B y 1 product3 product2 product4 product5 product6 1 1 1 1 1 1 1

I have tried this code df.groupby([category, subcategory)] group.agg({crn: count, product1:sum, product2:sum, product3:sum......)

Topproduct = df.iloc[:,1:].sum().nlargest(5).index.tolist()

Alok
  • 11
  • 5
  • "and vote for it" lol your post doesn't even answer his question.. – Sparkling Marcel Jun 23 '23 at 06:59
  • are all the values 1? why is product3 the topproduct1 for the first row? where sis the B/Y come from? please make sure your examples are meaningful with non-ambiguous numbers and that the provided output matches the input – mozway Jun 23 '23 at 07:09

1 Answers1

1

You can melt, sort_values, groupby.head, and pivot back:

cols = ['Category', 'subcategory', 'crn']
N = 5

out = (df
   .melt(cols, var_name='topproduct', value_name='sumofproducts')
   .sort_values(by=cols+['sumofproducts'], ascending=False)
   .groupby(cols).head(N)
   .assign(col=lambda d: d.groupby(cols).cumcount().add(1))
   .pivot(index=cols, columns='col')
   .pipe(lambda d: d.set_axis(d.columns.map(lambda x: f'{x[0]}_{x[1]}'), axis=1))
   .reset_index()
)

Output:

  Category subcategory  crn topproduct_1 topproduct_2 topproduct_3 topproduct_4 topproduct_5  sumofproducts_1  sumofproducts_2  sumofproducts_3  sumofproducts_4  sumofproducts_5
0        A           X    1     product6     product5     product1     product3     product4                9                7                5                3                3
1        A           Y    1     product6     product7     product4     product5     product1                8                8                7                6                5
2        A           Z    1     product5     product3     product4     product2     product7                8                7                7                6                5
3        B           X    1     product1     product3     product2     product4     product5                9                9                8                4                3

Used input:

df = pd.DataFrame({'Category': ['A', 'A', 'A', 'B'],
                   'subcategory': ['X', 'Y', 'Z', 'X'],
                   'crn': [1, 1, 1, 1],
                   'product1': [5, 5, 1, 9],
                   'product2': [0, 2, 6, 8],
                   'product3': [3, 4, 7, 9],
                   'product4': [3, 7, 7, 4],
                   'product5': [7, 6, 8, 3],
                   'product6': [9, 8, 1, 0],
                   'product7': [3, 8, 5, 3]})
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Well, you had the logic the rest is mostly details ;) – mozway Jun 23 '23 at 07:20
  • your answer is much nicer - nice one! – Umar.H Jun 23 '23 at 07:26
  • Getting an error values is not ordered, please explicitly specify the categories order by passing in a categories argument – Alok Jun 23 '23 at 08:08
  • @Alok please provide a [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) that triggers your error – mozway Jun 23 '23 at 08:30
  • Issue resolved i more thing i want to showcase the count of crn – Alok Jun 26 '23 at 07:34