Pandas groupby - divide by the sum of all groups

Question

I have a DataFrame df and I create gb = df.groupby("column1"). Now I would like to do the following:

x = gb.apply(lambda x: x["column2"].sum() / df["column2"].sum())

It works but I would like to based everytinh on x not x and df. Ideally I expected that there is a function x.get_source_df and then my solution would be:

x = gb.apply(lambda x: x["column2"].sum() / x.get_source_df()["column2"].sum())

and in that case I could save this lambda function in a dictionary which I could use for any df. Is it possible?

https://stackoverflow.com/help/minimal-reproducible-example – Panda Kim Nov 19 '22 at 12:50 — Panda Kim, Nov 19 '22 at 12:50

score 1 · Answer 1 · answered Nov 19 '22 at 13:00

1

you should not use apply here, may be you find it interesting, optimal method would be

df.groupby('column1')['column2'].sum().div(df['column2'].sum())

It works for more than one column too.

answered Nov 19 '22 at 13:00

ansev

30,322
5
17
31

score 1 · Answer 2 · answered Nov 20 '22 at 01:32

I am not sure in your explanation that you want to divide for the sum of each group or divide for the sum of the entire database. I assume what you want is to divide the sum of each group.

Data:

df = pd.DataFrame({'name':['a']*5+['b']*5,
                   'year':[2001,2002,2003,2004,2005]*2,
                   'val1':[1,2,3,4,5,None,7,8,9,10],
                   'val2':[21,22,23,24,25,26,27,28,29,30]})

Using transform then simply divide col by col:

df['sum'] = df.groupby('name')['val1'].transform(lambda g: g.sum())
df['weight'] = df['val1']/df['sum']

Pandas groupby - divide by the sum of all groups

2 Answers2