1

I tried to use the following code to normalize a column in python data frame:

df['x_norm'] = df.apply(lambda x: (x['X'] - x['X'].mean()) / (x['X'].max() - x['X'].min()),axis=1)

but got the following error:

    df['x_norm'] = df.apply(lambda x: (x['X'] - x['X'].mean()) / (x['X'].max() - x['X'].min()),axis=1)
AttributeError: ("'float' object has no attribute 'mean'", u'occurred at index 0')

Does anyone know what I missed here? Thanks!

Edamame
  • 23,718
  • 73
  • 186
  • 320

2 Answers2

3

I'm assuming you are using Pandas.

Instead of applying to the entire DataFrame apply (Documentation) only to the Series 'X', also you should pre-calculate the mean, max and min values. Something like this:

avg = df['X'].mean()
diff = df['X'].max() - df['X'].min()
new_df = df['X'].apply(lambda x: (x-avg)/diff) 

If you are looking to normalize the entire DataFrame check this answer:

df.apply(lambda x: (x - np.mean(x)) / (np.max(x) - np.min(x)))
Community
  • 1
  • 1
João Almeida
  • 4,487
  • 2
  • 19
  • 35
0

If you want to normalize values in column X:

df['x_norm'] = df.X.div(df.X.sum())

Step by step:

In [65]: df
Out[65]:
   a  b  X
0  2  1  5
1  1  4  5
2  7  4  7
3  1  6  6
4  5  5  8
5  5  8  2
6  6  7  5
7  8  2  5
8  7  9  9
9  9  6  5

In [68]: df['x_norm'] = df.X.div(df.X.sum())

In [69]: df
Out[69]:
   a  b  X    x_norm
0  2  1  5  0.087719
1  1  4  5  0.087719
2  7  4  7  0.122807
3  1  6  6  0.105263
4  5  5  8  0.140351
5  5  8  2  0.035088
6  6  7  5  0.087719
7  8  2  5  0.087719
8  7  9  9  0.157895
9  9  6  5  0.087719

check:

In [70]: df.x_norm.sum()
Out[70]: 1.0
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419