1

I have a DataFrame df and I create gb = df.groupby("column1"). Now I would like to do the following:

x = gb.apply(lambda x: x["column2"].sum() / df["column2"].sum())

It works but I would like to based everytinh on x not x and df. Ideally I expected that there is a function x.get_source_df and then my solution would be:

x = gb.apply(lambda x: x["column2"].sum() / x.get_source_df()["column2"].sum())

and in that case I could save this lambda function in a dictionary which I could use for any df. Is it possible?

Jason
  • 313
  • 2
  • 8

2 Answers2

1

you should not use apply here, may be you find it interesting, optimal method would be

df.groupby('column1')['column2'].sum().div(df['column2'].sum())

It works for more than one column too.

ansev
  • 30,322
  • 5
  • 17
  • 31
1

I am not sure in your explanation that you want to divide for the sum of each group or divide for the sum of the entire database. I assume what you want is to divide the sum of each group.

Data:

df = pd.DataFrame({'name':['a']*5+['b']*5,
                   'year':[2001,2002,2003,2004,2005]*2,
                   'val1':[1,2,3,4,5,None,7,8,9,10],
                   'val2':[21,22,23,24,25,26,27,28,29,30]})

Using transform then simply divide col by col:

df['sum'] = df.groupby('name')['val1'].transform(lambda g: g.sum())
df['weight'] = df['val1']/df['sum']
PTQuoc
  • 938
  • 4
  • 13