groupby and then count by status

Question

I have a dataframe similar to this:

id    name  status output
123    John   a      33.33%
232    Wang   b      50%
324    Wang   a      50%
424    Cici   a      100%
553    John   b      33.33%
653    John   b      33.33%

I need to 1) groupby name 2) count the percentage where status == a. Output is listed on the right

I used following code:

df['output'] = df.groupby('name')['id'].transform(lambda x: x[x['status == a']].count/len(x))

but the return was wrong, is there anyway I can fix this.

I think this will work: `df.groupby('name').transform(lambda x: x[x['status' == 'a']].count()/len(x))` — ashkangh, Mar 11 '21 at 18:33

score 2 · Answer 1 · answered Mar 11 '21 at 18:38

2

Also try this without a lambda function:

df['output'] = df['status'].eq('a').groupby(df['name']).transform('mean')

answered Mar 11 '21 at 18:38

Quang Hoang

146,074
10
56
74

Note that there is a `df` inside groupby() – Quang Hoang Mar 11 '21 at 20:38

score 0 · Answer 2 · answered Mar 11 '21 at 18:35

0

use

df['output'] = df.groupby('name')['status'].transform(lambda x: (x=='a').mean())

You have some syntax mistakes in comparing and count is a function not attribute

answered Mar 11 '21 at 18:35

Raghava Dhanya

959
15
22

While this _will work_ the `lambda` requires a slow python loop over the groups which can be nearly 100x slower. https://stackoverflow.com/questions/63306888/general-groupby-in-python-pandas-fast-way/63307380#63307380, which is an issue for larger datasets (the example there is this exact manipulation, by coincidene) – ALollz Mar 11 '21 at 19:08
Yes, I agree, I was just modifying OP's code to fix it. Quang Hoang's answer is much better I think for larger dataset. @ALollz – Raghava Dhanya Mar 12 '21 at 06:09

score 0 · Answer 3 · answered Mar 11 '21 at 18:43

Try with crosstab

df['out'] = df.name.map(pd.crosstab(df.name,df.status,normalize='index')['a'])
df
Out[60]: 
    id  name status  output       out
0  123  John      a  33.33%  0.333333
1  232  Wang      b     50%  0.500000
2  324  Wang      a     50%  0.500000
3  424  Cici      a    100%  1.000000
4  553  John      b  33.33%  0.333333
5  653  John      b  33.33%  0.333333

groupby and then count by status

3 Answers3