42

I have the following data frame:

data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})

product_id  user_id
    p1       a1
    p1       a1
    p2       a1
    p1       a2
    p1       a2
    p1       a2
    p2       a3
    p2       a3
    p3       a3

in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame

output should be something like this:

user_id product_id  count
a1       p1            2
a1       p2            1
a2       p1            3
a3       p2            2
a3       p3            1

I have tried the following code:

grouped=data.groupby(['user_id','product_id']).count()

but the result is:

user_id product_id
 a1       p1
          p2
 a2       p1
 a3       p2
          p3

actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

zabop
  • 6,750
  • 3
  • 39
  • 84
chessosapiens
  • 3,159
  • 10
  • 36
  • 58
  • Possible duplicate of [Python: get a frequency count based on two columns (variables) in pandas dataframe](http://stackoverflow.com/questions/33271098/python-get-a-frequency-count-based-on-two-columns-variables-in-pandas-datafra) – Merlin Aug 13 '16 at 13:33

3 Answers3

65

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64
Nehal J Wani
  • 16,071
  • 3
  • 64
  • 89
8

In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts()

Output:

product_id  user_id
p1          a2         3
p2          a3         2
p1          a1         2
p3          a3         1
p2          a1         1

If you need a DataFrame:

df.value_counts().to_frame('counts').reset_index()

Output:

  product_id user_id  counts
0         p1      a2       3
1         p2      a3       2
2         p1      a1       2
3         p3      a3       1
4         p2      a1       1
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
0

Building a little on @Nehal's response if you want to create another column with the results combined:

you can combine the product_id and size columns:

new_df['combo'] = new_df['product_id'].map(str) + '(' + new_df['size'].map(str) +')'

...and then use groupby again to organize that column by 'user_id'

new_df['combo'].astype(str).groupby(new_df['userid']).agg(['size',', '.join])
sparrow
  • 10,794
  • 12
  • 54
  • 74