2

I am new to pandas and numpy. Currently I want to caculate the weighted mean within the group. The codes searched from the internet work well for me.

import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[0]*3+[1]*3,'se':[1]*2+[2]*2+[3]*2,'y':np.random.randn(6),'x':np.random.randn(6)})

def wavg(group, avg_name, weight_name):
    d = group[avg_name]
    w = group[weight_name]
    try:
        return (d * w).sum() / w.sum()
    except ZeroDivisionError:
        return np.nan
    
cc=df.groupby(['id','se']).apply(wavg, 'x','y').reset_index().rename(columns={0: 'retx'})

However, I am confused about how to build an apply function like the 'wavg'. what's the group in the wavg function. Can anyone explain it in detail?

Thanks in advance.

Zachary
  • 57
  • 7
  • See: https://stackoverflow.com/questions/62092600/how-does-apply-work-on-a-pandas-dataframe-groupby. Groupby.apply will pass the susbet of the dataframe for each group to your function. So `group` is a DataFrame where there is a single unique combination of ['id', 'se']. If you are still a bit confused, look at the output of `df.groupby(['id', 'se']).apply(lambda group: print(group, '\n'))`. That should show you exactly what `group` refers to. – ALollz Jul 20 '20 at 18:54
  • 1
    @ALollz, Thanks a lot. – Zachary Jul 21 '20 at 01:39

0 Answers0