Normalize dataframe series by group

Question

In a dataframe like this

id    name    month    count
111   a       1        5
112   b       1        4
113   c       1        6
111   a       2        1
112   b       2        7
113   c       2        6

I want to normalize the values for a, b and c respectively.

The best approach is likely to groupby and aggregate, something like this:

df_normalized = df.groupby(["id", "name"]).agg(normalized_count=("count", "lambda x: (x - x.mean()) / x.std())")).reset_index(drop=True)

This however results in an AttributeError.

AttributeError: 'SeriesGroupBy' object has no attribute 'lambda x: (x - x.mean()) / x.std())'

Here is a similar way by using transform, which unfortunately doesn't seem to respect the "a,b,c".

Normalize DataFrame by group

Another way, in the same thread, is rather ugly, and uses multiple steps:

means_stds = df.groupby('indx')['a0'].agg(['mean','std']).reset_index()
df = df.merge(means_stds,on='indx')
df['a0_normalized'] = (df['a0'] - df['mean']) / df['std']

Is there a way to achieve this using groupby?

The one in the answer below works like a charme – Dustin Feb 23 '21 at 12:03 — Dustin, Feb 23 '21 at 12:03

jezrael · Accepted Answer · 2021-02-23T12:03:41.190

Use GroupBy.transform with lambda function:

df['c1_normalized'] = (df.groupby(["id", "name"])['count']
                         .transform(lambda x: (x - x.mean()) / x.std()))

For compare is changed your second solution:

means_stds = df.groupby(["id", "name"])['count'].agg(['mean','std']).reset_index()
df = df.merge(means_stds,on=["id", "name"])
df['c2_normalized'] = (df['count'] - df['mean']) / df['std']
print (df)
    id name  month  count  c1_normalized  mean       std  c2_normalized
0  111    a      1      5       0.707107   3.0  2.828427       0.707107
1  111    a      2      1      -0.707107   3.0  2.828427      -0.707107
2  112    b      1      4      -0.707107   5.5  2.121320      -0.707107
3  112    b      2      7       0.707107   5.5  2.121320       0.707107
4  113    c      1      6            NaN   6.0  0.000000            NaN
5  113    c      2      6            NaN   6.0  0.000000            NaN

Ah thank you, adding the transform column didn't come to my mind. Very elegant — Dustin, Feb 23 '21 at 12:04

Normalize dataframe series by group

1 Answers1