You need to groupby
over both days and products and then use size
. Once you have done this you will have all the counts in the df
you require.
You will then need to sort both the day
and the default 0
column which now contains your counts, this has been created by resetting your index on the initial groupby
.
We follow the instructions in Pandas get topmost n records within each group to give your desired result.
A full example:
Setup:
df = pd.DataFrame({'day':[1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3],
'value':['a','a','b','b','b','c','a','a','b','b','b','c','a','a','b','b','b','c']})
df.head(6)
day value
0 1 a
1 1 a
2 1 b
3 1 b
4 1 b
5 1 c
df_counts = df.groupby(['day','values']).size().reset_index().sort_values(['day', 0], ascending = [True, False])
df_top_2 = df_counts.groupby('day').head(2)
df_top_2
day value 0
1 1 b 3
0 1 a 2
4 2 b 3
3 2 a 2
7 3 b 3
6 3 a 2
Of course, you should rename the 0
column to something more reasonable but this is a minimal example.