Python Dataframe: normalize a numerical column using lambda

Question

I tried to use the following code to normalize a column in python data frame:

df['x_norm'] = df.apply(lambda x: (x['X'] - x['X'].mean()) / (x['X'].max() - x['X'].min()),axis=1)

but got the following error:

    df['x_norm'] = df.apply(lambda x: (x['X'] - x['X'].mean()) / (x['X'].max() - x['X'].min()),axis=1)
AttributeError: ("'float' object has no attribute 'mean'", u'occurred at index 0')

Does anyone know what I missed here? Thanks!

could you please provide a sample data set (5-7 rows) and expected output? — MaxU - stand with Ukraine, Apr 03 '16 at 15:24

score 3 · Answer 1 · edited May 23 '17 at 11:45

I'm assuming you are using Pandas.

Instead of applying to the entire DataFrame apply (Documentation) only to the Series 'X', also you should pre-calculate the mean, max and min values. Something like this:

avg = df['X'].mean()
diff = df['X'].max() - df['X'].min()
new_df = df['X'].apply(lambda x: (x-avg)/diff)

If you are looking to normalize the entire DataFrame check this answer:

df.apply(lambda x: (x - np.mean(x)) / (np.max(x) - np.min(x)))

MaxU - stand with Ukraine · Answer 2 · 2016-04-03T15:38:48.447

If you want to normalize values in column X:

df['x_norm'] = df.X.div(df.X.sum())

Step by step:

In [65]: df
Out[65]:
   a  b  X
0  2  1  5
1  1  4  5
2  7  4  7
3  1  6  6
4  5  5  8
5  5  8  2
6  6  7  5
7  8  2  5
8  7  9  9
9  9  6  5

In [68]: df['x_norm'] = df.X.div(df.X.sum())

In [69]: df
Out[69]:
   a  b  X    x_norm
0  2  1  5  0.087719
1  1  4  5  0.087719
2  7  4  7  0.122807
3  1  6  6  0.105263
4  5  5  8  0.140351
5  5  8  2  0.035088
6  6  7  5  0.087719
7  8  2  5  0.087719
8  7  9  9  0.157895
9  9  6  5  0.087719

check:

In [70]: df.x_norm.sum()
Out[70]: 1.0

Python Dataframe: normalize a numerical column using lambda

2 Answers2