1

I have this dataframe:

nome       code  tipo   score
Alexandre   AAA  Frads  4000
Alexandre   AAA  Memb   10000
Alexandre   AAA  Memb   20000
Bruno       BBB  Dans   10000
Bruno       BBB  Grap   4000

Values available in this Google Sheets

I need to create a new column summing the rows with same nome and code where tipo = 'Memb', in a way that it looks like this:

enter image description here

I tried groupby with transform('sum') however it is getting me the wrong result.

df['score'].loc[df['tipo'] == "Memb"]=df[['nome','code','score']].groupby(['nome','code'])['score'].transform('sum')

enter image description here

What am I missing?

aabujamra
  • 4,494
  • 13
  • 51
  • 101
  • 4
    Please include dataframes that I can copy and paste, I could work on a solution right now but I don't want to type everything in... check out [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – timgeb May 31 '20 at 11:43
  • Oh, sorry about that, added a Google Sheets link, does it solve it? – aabujamra May 31 '20 at 11:47
  • https://docs.google.com/spreadsheets/d/1BRibP52tqoHEHr1xcJgx0J-ecmaJ6NA24rb6p9IZlNM/edit#gid=0 – aabujamra May 31 '20 at 11:47
  • It's better in the question itself but it prevents having to type it in, thanks. – timgeb May 31 '20 at 11:59
  • 1
    I've posted an answer, please check it if that solves your problem – EXODIA May 31 '20 at 12:20

3 Answers3

3

For improve performance is possible replace score to 0 values by Series.mask and then use GroupBy.transform with sum:

df['Memb_sum']  = (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0))
                     .groupby(['nome','code'])['score']
                     .transform('sum'))
print (df)
        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

Details:

print (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0)))

        nome code   tipo  score
0  Alexandre  AAA  Frads      0
1  Alexandre  AAA   Memb  10000
2  Alexandre  AAA   Memb  20000
3      Bruno  BBB   Dans      0
4      Bruno  BBB   Grap      0   
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    By the way, it would be good if there was a version of `transform` that got passed the whole group, similar to `apply`. – timgeb May 31 '20 at 13:19
2

You can try this.

Set 'tipo' as index using df.set_index, then group values with similar nome and code using df.groupby and use df.transform and sum of those indices which are equal to Memb

df['Memb_sum'] =  (df.set_index('tipo').
                     groupby(['nome','code']).score.
                     transform(lambda x:x.loc[x.index=='Memb'].sum()).
                     values)

Output:

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
1
import numpy as np

df['Memb_sum']=df.groupby(['nome','code','tipo'])['score'].transform('sum')

df['Memb_sum']=np.where(df['tipo'] != 'Memb', 0, df['Memb_sum'])

df['Memb_sum']=df.groupby(['nome','code'])['Memb_sum'].transform('max')

You can perform group by first and later filter out values.

Output:

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0
EXODIA
  • 908
  • 3
  • 10
  • 28