1

I have a DataFrame with two columns "A" and "B".

   A          B   
0  foo      one 
1  bar      one 
2  foo      two 
3  bar      one
4  foo      two 
5  bar      two 
6  foo      one 
7  foo      one
8  xyz      one

For each group in "A", I'm trying to get the count of each value of "B", i.e. each sub-group of B, but aggregated on the grouping of "A".

The result should look like this:

   A          B   countOne  countTwo
0  foo      one          3         2
1  bar      one          2         1
2  foo      two          3         2
3  bar      one          2         1
4  foo      two          3         2
5  bar      two          2         1
6  foo      one          3         2
7  foo      one          3         2
8  xyz      one          1         0

I have tried several approaches to no avail, so far I'm using this approach:

A_grouped = df.groupby(['A', 'B'])['A'].count()
A_grouped_ones = A_grouped[:,'one']
A_grouped_twos = A_grouped[:,'two']
df['countOne'] = df['A'].map(lambda a: A_grouped_ones[a] if a in A_grouped_ones else 0)
df['countTwo'] = df['A'].map(lambda a: A_grouped_twos[a] if a in A_grouped_twos else 0)

However, this seems horribly inefficient two me. Is there a better solution?

jessepeng
  • 610
  • 6
  • 21

1 Answers1

3

You can use unstack with add_prefix for new DataFrame and join to original:

df1 = df.groupby(['A', 'B'])['A'].count().unstack(fill_value=0).add_prefix('count_')
print (df1)
B    count_one  count_two
A                        
bar          2          1
foo          3          2
xyz          1          0

df = df.join(df1, on='A')
print (df)
     A    B  count_one  count_two
0  foo  one          3          2
1  bar  one          2          1
2  foo  two          3          2
3  bar  one          2          1
4  foo  two          3          2
5  bar  two          2          1
6  foo  one          3          2
7  foo  one          3          2
8  xyz  one          1          0

Another alternative is use size:

df1 = df.groupby(['A', 'B']).size().unstack(fill_value=0).add_prefix('count_')

Differences are size includes NaN values, count does not - check this answer.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252