I need to group a Pandas dataframe by date, and then take a weighted average of given values. Here's how it's currently done using the margin value as an example (and it works perfectly until there are NaN values):
df = orders.copy()
# Create new columns as required
df['margin_WA'] = df['net_margin'].astype(float) # original data as str or Decimal
def group_wa():
return lambda num: np.average(num, weights=df.loc[num.index, 'order_amount'])
agg_func = {
'margin_WA': group_wa(), # agg_func includes WAs for other elements
}
result = df.groupby('order_date').agg(agg_func)
result['margin_WA'] = result['margin_WA'].astype(str)
In the case where 'net_margin'
fields contain NaN
values, the WA is set to NaN. I can't seem to be able to dropna()
or filtering by pd.notnull
when creating new columns, and I don't know where to create a masked array to avoid passing NaN
to the group_wa
function (like suggested here). How do I ignore NaN
in this case?