1

I have a data frame with multiple columns and I want to use count after group by such that it is applied to the combination of 2 or more columns. for example, let's say I have two columns:

user_id  product_name
1        Apple
1        Banana
1        Apple
2        Carrot
2        Tomato
2        Carrot
2        Tomato 
3        Milk
3        Cucumber

...

What I want to achieve is something like this:

user_id  product_name Product_Count_per_User
1        Apple        1 
1        Banana       2
2        Carrot       2
2        Tomato       2
3        Milk         1
3        Cucumber     1

I cannot get it. I tried this:

dcf6 = df3.groupby(['user_id','product_name'])['user_id', 'product_name'].count()

but it does not seem to get what I want and it is displaying 4 columns instead of 3. How to do to it? Thanks.

mj1261829
  • 1,200
  • 3
  • 26
  • 53

3 Answers3

3

You are counting two columns at the same time, you can just use groupby.size:

(df.groupby(['user_id', 'Product_Name']).size()
   .rename('Product_Count_per_User').reset_index())

enter image description here

Or count only one column:

df.groupby(['user_id','Product_Name'])['user_id'].size()
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Hmmm, count is used for non NaNs counting, so here better is size. – jezrael Jul 31 '17 at 13:47
  • @jezrael Sure. But `groupby` drops `nan` values by default, so I guess it doesn't matter that much in this case now, as he is counting the group variables anyway. But I agree `size` is a better choice here. – Psidom Jul 31 '17 at 13:50
3

Use GroupBy.size:

dcf6 = df3.groupby(['user_id','Product_Name']).size()
          .reset_index(name='Product_Count_per_User')
print (dcf6)
   user_id Product_Name  Product_Count_per_User
0        1        Apple                       2
1        1       Banana                       1
2        2       Carrot                       2
3        2       Tomato                       2
4        3     Cucumber                       1
5        3         Milk                       1

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Base on your own code , just do this .

df.groupby(['user_id','product_name'])['user_id'].
          agg({'Product_Count_per_User':'count'}).reset_index(level=1)

        product_name  Product_Count_per_User
user_id                      
1              Apple        2
1             Banana        1
2             Carrot        2
2             Tomato        2
3           Cucumber        1
3               Milk        1
BENY
  • 317,841
  • 20
  • 164
  • 234