Selecting rows with groupby, summing columns and creating new column with the sum for all groupby elements

Question

I have this dataframe:

nome       code  tipo   score
Alexandre   AAA  Frads  4000
Alexandre   AAA  Memb   10000
Alexandre   AAA  Memb   20000
Bruno       BBB  Dans   10000
Bruno       BBB  Grap   4000

Values available in this Google Sheets

I need to create a new column summing the rows with same nome and code where tipo = 'Memb', in a way that it looks like this:

I tried groupby with transform('sum') however it is getting me the wrong result.

df['score'].loc[df['tipo'] == "Memb"]=df[['nome','code','score']].groupby(['nome','code'])['score'].transform('sum')

What am I missing?

Please include dataframes that I can copy and paste, I could work on a solution right now but I don't want to type everything in... check out [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — timgeb, May 31 '20 at 11:43
Oh, sorry about that, added a Google Sheets link, does it solve it? — aabujamra, May 31 '20 at 11:47
https://docs.google.com/spreadsheets/d/1BRibP52tqoHEHr1xcJgx0J-ecmaJ6NA24rb6p9IZlNM/edit#gid=0 — aabujamra, May 31 '20 at 11:47
It's better in the question itself but it prevents having to type it in, thanks. — timgeb, May 31 '20 at 11:59
I've posted an answer, please check it if that solves your problem — EXODIA, May 31 '20 at 12:20

score 3 · Accepted Answer · answered May 31 '20 at 12:32

For improve performance is possible replace score to 0 values by Series.mask and then use GroupBy.transform with sum:

df['Memb_sum']  = (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0))
                     .groupby(['nome','code'])['score']
                     .transform('sum'))
print (df)
        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

Details:

print (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0)))

        nome code   tipo  score
0  Alexandre  AAA  Frads      0
1  Alexandre  AAA   Memb  10000
2  Alexandre  AAA   Memb  20000
3      Bruno  BBB   Dans      0
4      Bruno  BBB   Grap      0

By the way, it would be good if there was a version of `transform` that got passed the whole group, similar to `apply`. — timgeb, May 31 '20 at 13:19

Ch3steR · Answer 2 · 2020-05-31T12:41:43.603

You can try this.

Set 'tipo' as index using df.set_index, then group values with similar nome and code using df.groupby and use df.transform and sum of those indices which are equal to Memb

df['Memb_sum'] =  (df.set_index('tipo').
                     groupby(['nome','code']).score.
                     transform(lambda x:x.loc[x.index=='Memb'].sum()).
                     values)

Output:

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

EXODIA · Answer 3 · 2020-05-31T12:16:35.703

import numpy as np

df['Memb_sum']=df.groupby(['nome','code','tipo'])['score'].transform('sum')

df['Memb_sum']=np.where(df['tipo'] != 'Memb', 0, df['Memb_sum'])

df['Memb_sum']=df.groupby(['nome','code'])['Memb_sum'].transform('max')

You can perform group by first and later filter out values.

Output:

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

Selecting rows with groupby, summing columns and creating new column with the sum for all groupby elements

3 Answers3

Linked