1

I have a dataframe similar to this:

id    name  status output
123    John   a      33.33%
232    Wang   b      50%
324    Wang   a      50%
424    Cici   a      100%
553    John   b      33.33%
653    John   b      33.33%

I need to 1) groupby name 2) count the percentage where status == a. Output is listed on the right

I used following code:

df['output'] = df.groupby('name')['id'].transform(lambda x: x[x['status == a']].count/len(x))

but the return was wrong, is there anyway I can fix this.

totolow
  • 93
  • 6
  • I think this will work: `df.groupby('name').transform(lambda x: x[x['status' == 'a']].count()/len(x))` – ashkangh Mar 11 '21 at 18:33

3 Answers3

2

Also try this without a lambda function:

df['output'] = df['status'].eq('a').groupby(df['name']).transform('mean')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

use

df['output'] = df.groupby('name')['status'].transform(lambda x: (x=='a').mean())

You have some syntax mistakes in comparing and count is a function not attribute

Raghava Dhanya
  • 959
  • 15
  • 22
  • While this _will work_ the `lambda` requires a slow python loop over the groups which can be nearly 100x slower. https://stackoverflow.com/questions/63306888/general-groupby-in-python-pandas-fast-way/63307380#63307380, which is an issue for larger datasets (the example there is this exact manipulation, by coincidene) – ALollz Mar 11 '21 at 19:08
  • Yes, I agree, I was just modifying OP's code to fix it. Quang Hoang's answer is much better I think for larger dataset. @ALollz – Raghava Dhanya Mar 12 '21 at 06:09
0

Try with crosstab

df['out'] = df.name.map(pd.crosstab(df.name,df.status,normalize='index')['a'])
df
Out[60]: 
    id  name status  output       out
0  123  John      a  33.33%  0.333333
1  232  Wang      b     50%  0.500000
2  324  Wang      a     50%  0.500000
3  424  Cici      a    100%  1.000000
4  553  John      b  33.33%  0.333333
5  653  John      b  33.33%  0.333333
BENY
  • 317,841
  • 20
  • 164
  • 234