Pandas aggregating category counts by user

Question

How does one use pandas to create frequency counts for each user for each category. I would like to do this so I can pivot to create a utility matrix

|--|**author** | **category**|   
0|  A | movies  
1|  B | games  
2|  C | pics  
4|  A | movies  
5|  C | movies  
6|  B | games 




|--|**author** | **category count**|   

A | movies |2 |  
B | games  |2 |  
C | movies |1 |  
C | pics   |1 |

score 0 · Accepted Answer · edited Sep 27 '17 at 16:49

You can use groupby with size for getting length of all categories in columns author and category - output is Series with MultiIndex.

print (df.groupby(['author','category']).size())
author  category
A       movies      2
B       games       2
C       movies      1
        pics        1
dtype: int64

Then add reset_index for creating columns from MultiIndex and set column name for value column - output is DataFrame:

df = df.groupby(['author','category']).size().reset_index(name='category count')
print (df)
  author category  category count
0      A   movies               2
1      B    games               2
2      C   movies               1
3      C     pics               1

But if need crosstab there is multiple solutions:

#add unstack for reshape
df1 = df.groupby(['author','category']).size().unstack(fill_value=0)
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

df1 = pd.crosstab(df['author'],df['category'])
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

df1 = df.pivot_table(index='author',columns='category', aggfunc='size', fill_value=0)
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

EDIT:

What is the difference between size and count in pandas?

Awesome, Thanks for a working solution. You even went the extra mile to show me the code for the utility matrix. If you didn't mind you could explain, why using the size/reset index does what it does? — Vince Kumar, Mar 27 '17 at 07:29
I try add some explanation, maybe also help [10min to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html) and [cookbook](http://pandas.pydata.org/pandas-docs/stable/cookbook.html). If something unclear, I try explain more. — jezrael, Mar 27 '17 at 07:35
Thank you! [size] (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.size.html) has no description on the documentation so I was pretty confused, but it make sense. Although I think that is a odd named method — Vince Kumar, Mar 27 '17 at 07:39
Yes, there is also count function, but it ia a bit different. See last edit, I add link for better explanation. — jezrael, Mar 27 '17 at 07:41

Pandas aggregating category counts by user

1 Answers1