In a dataframe like this
id name month count
111 a 1 5
112 b 1 4
113 c 1 6
111 a 2 1
112 b 2 7
113 c 2 6
I want to normalize the values for a, b and c respectively.
The best approach is likely to groupby and aggregate, something like this:
df_normalized = df.groupby(["id", "name"]).agg(normalized_count=("count", "lambda x: (x - x.mean()) / x.std())")).reset_index(drop=True)
This however results in an AttributeError.
AttributeError: 'SeriesGroupBy' object has no attribute 'lambda x: (x - x.mean()) / x.std())'
Here is a similar way by using transform, which unfortunately doesn't seem to respect the "a,b,c".
Another way, in the same thread, is rather ugly, and uses multiple steps:
means_stds = df.groupby('indx')['a0'].agg(['mean','std']).reset_index()
df = df.merge(means_stds,on='indx')
df['a0_normalized'] = (df['a0'] - df['mean']) / df['std']
Is there a way to achieve this using groupby?