0

I have dataframe like in the below pic.

First; I want the top 2 products, second I need the top 2 products frequents per day, so I need to group it by days and select the top 2 products from products column, I tried this code but it gives an error.

df.groupby("days", as_index=False)(["products"] == "Follow Up").count()

enter image description here

bigreddot
  • 33,642
  • 5
  • 69
  • 122
Islam Fahmy
  • 139
  • 10
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jan 07 '20 at 13:24
  • you can use [pandas.Series.value_counts](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html) to get top-n of a column, for top-n after groupby refer [here](https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group) – Shijith Jan 07 '20 at 13:34

1 Answers1

0

You need to groupby over both days and products and then use size. Once you have done this you will have all the counts in the df you require.

You will then need to sort both the day and the default 0 column which now contains your counts, this has been created by resetting your index on the initial groupby.

We follow the instructions in Pandas get topmost n records within each group to give your desired result.

A full example:

Setup:

df = pd.DataFrame({'day':[1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3],
'value':['a','a','b','b','b','c','a','a','b','b','b','c','a','a','b','b','b','c']})
df.head(6)

    day value
0   1   a
1   1   a
2   1   b
3   1   b
4   1   b
5   1   c
df_counts = df.groupby(['day','values']).size().reset_index().sort_values(['day', 0], ascending = [True, False])
df_top_2 = df_counts.groupby('day').head(2)
df_top_2

   day value 0
1   1   b   3
0   1   a   2
4   2   b   3
3   2   a   2
7   3   b   3
6   3   a   2

Of course, you should rename the 0 column to something more reasonable but this is a minimal example.

B.C
  • 577
  • 3
  • 18
  • Many thanks dear this is really helped me a lot, last question can you plot this using bokeh line graph I'm sorry I'm new to python and really confused if you have any tutorials recommendation will be appreciated – Islam Fahmy Jan 08 '20 at 03:57