3

I am trying to implement mean normalization of rows in pandas. Find the mean of every row in pandas, subtract the mean from each element for the particular row.

Code:

df = pd.DataFrame(np.random.randint(0,100,size=(4, 5)), columns=list('ABCDE'))
print (df)


    A   B   C   D   E
0  53  77  34  51  41
1  44  46   6  70  31
2  52  22  95  88  13
3  77  18  88  86  20


x = pd.DataFrame(df.mean(axis = 1),columns=['mean'])

for index,rows in df.iterrows():
  for i in range(len(x)):
     df.loc[index] = df.loc[index] - x.loc[i]
print (df)


op:

     A   B   C   D   E
  0 NaN NaN NaN NaN NaN
  1 NaN NaN NaN NaN NaN
  2 NaN NaN NaN NaN NaN
  3 NaN NaN NaN NaN NaN

Any suggestions on what's the mistake

jpp
  • 159,742
  • 34
  • 281
  • 339
data_person
  • 4,194
  • 7
  • 40
  • 75

2 Answers2

2

You can just use apply in this way:

df = df.apply(lambda x: x - df.mean(axis = 1))

Output:

      A     B     C     D     E
0   1.8  25.8 -17.2  -0.2 -10.2
1   4.6   6.6 -33.4  30.6  -8.4
2  -2.0 -32.0  41.0  34.0 -41.0
3  19.2 -39.8  30.2  28.2 -37.8
Joe
  • 12,057
  • 5
  • 39
  • 55
0

You can perform this calculation in a vectorised fashion using numpy:

A = df.values
A = A - A.mean(axis=1)[:, None]

res = pd.DataFrame(A, index=df.index, columns=df.columns)

print(A)

array([[11, 31, 78, 55, 71],
       [89, 39, 39, 16, 45],
       [26, 10, 85, 68, 93],
       [55, 19, 78, 30, 41]])

print(res)

      A     B     C     D     E
0 -38.2 -18.2  28.8   5.8  21.8
1  43.4  -6.6  -6.6 -29.6  -0.6
2 -30.4 -46.4  28.6  11.6  36.6
3  10.4 -25.6  33.4 -14.6  -3.6
jpp
  • 159,742
  • 34
  • 281
  • 339