I am trying to get top5 products from a list of products

Question

So i have a data like this

Category	subcategory	crn	product1	product2	product3	product4	product5	product6	product7
A	X	1	1	1	1	1	1	1	1
A	Y	1	1	1	1 1	1	1	1
A	Z	1	1	1	1	1	1	1	1
B	X 1	1	1	1	1	1	1	1

And i want to showcase output like this

Category	subcategory	crncount	topproduct1	topproduct2	topproduct3	topproduct4	topproduct5	sumofproduct1	sumof product2	sumofprduct3	sumofproduct4	sumofprodct5	sumofproduct6	sumofprodct7
A	x	1	product3	product2	product4	product5	product6	1	1	1	1	1	1	1
A	y	1	product3	product4	product5	product6	product7	1	1	1	1	1	1	1
B	x	1	product1	product3	product4	product5	product6	1	1	1	1	1	1	1
B	y	1	product3	product2	product4	product5	product6	1	1	1	1	1	1	1

I have tried this code df.groupby([category, subcategory)] group.agg({crn: count, product1:sum, product2:sum, product3:sum......)

Topproduct = df.iloc[:,1:].sum().nlargest(5).index.tolist()

"and vote for it" lol your post doesn't even answer his question.. — Sparkling Marcel, Jun 23 '23 at 06:59
are all the values 1? why is product3 the topproduct1 for the first row? where sis the B/Y come from? please make sure your examples are meaningful with non-ambiguous numbers and that the provided output matches the input — mozway, Jun 23 '23 at 07:09

mozway · Answer 1 · 2023-06-23T07:22:48.473

You can melt, sort_values, groupby.head, and pivot back:

cols = ['Category', 'subcategory', 'crn']
N = 5

out = (df
   .melt(cols, var_name='topproduct', value_name='sumofproducts')
   .sort_values(by=cols+['sumofproducts'], ascending=False)
   .groupby(cols).head(N)
   .assign(col=lambda d: d.groupby(cols).cumcount().add(1))
   .pivot(index=cols, columns='col')
   .pipe(lambda d: d.set_axis(d.columns.map(lambda x: f'{x[0]}_{x[1]}'), axis=1))
   .reset_index()
)

Output:

  Category subcategory  crn topproduct_1 topproduct_2 topproduct_3 topproduct_4 topproduct_5  sumofproducts_1  sumofproducts_2  sumofproducts_3  sumofproducts_4  sumofproducts_5
0        A           X    1     product6     product5     product1     product3     product4                9                7                5                3                3
1        A           Y    1     product6     product7     product4     product5     product1                8                8                7                6                5
2        A           Z    1     product5     product3     product4     product2     product7                8                7                7                6                5
3        B           X    1     product1     product3     product2     product4     product5                9                9                8                4                3

Used input:

df = pd.DataFrame({'Category': ['A', 'A', 'A', 'B'],
                   'subcategory': ['X', 'Y', 'Z', 'X'],
                   'crn': [1, 1, 1, 1],
                   'product1': [5, 5, 1, 9],
                   'product2': [0, 2, 6, 8],
                   'product3': [3, 4, 7, 9],
                   'product4': [3, 7, 7, 4],
                   'product5': [7, 6, 8, 3],
                   'product6': [9, 8, 1, 0],
                   'product7': [3, 8, 5, 3]})

Getting an error values is not ordered, please explicitly specify the categories order by passing in a categories argument — Alok, Jun 23 '23 at 08:08
@Alok please provide a [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) that triggers your error — mozway, Jun 23 '23 at 08:30
Issue resolved i more thing i want to showcase the count of crn — Alok, Jun 26 '23 at 07:34

I am trying to get top5 products from a list of products

1 Answers1