Mean of data scaled with sklearn StandardScaler is not zero

Question

I have the following code

import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np

df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True) # drops the empty line at file-end 

X = df.ix[:,0:4].values
y = df.ix[:,4].values

Next I am scaling the data and get the mean values:

X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)

What I do not get is that my output is this:

[ -4.73695157e-16  -6.63173220e-16   3.31586610e-16  -2.84217094e-16]

I do understand how these values can be anything other than 0. If I scale it, it should be 0 zero right?

Could anyone explain to me what happens here?

-4.73695157e-16 == -0.000000000000000473695157 which is basically equal to 0 — jkr, Nov 03 '16 at 15:57

score 16 · Accepted Answer · edited Apr 13 '17 at 12:45

16

In practice those values are so close to 0 that you can consider them to be 0.

The scaler tries to set the mean to be zero, but due to limitations with numerical representation it can only get the mean really close to 0.

Check this question on the precision of floating point arithmetics.

Also interesting is the concept of Machine Epsilon and that for a float 64 is something like 2.22e-16

edited Apr 13 '17 at 12:45

Community

1
1

answered Nov 03 '16 at 15:49

João Almeida

4,487
2
19
35

Why would it be non-zero at all? Wouldn't it just require subtracting the actual mean as calculated from the target feature array? – sandyp Jun 02 '17 at 21:42
@sandyp you have to understand that subtracting the mean doesn't have infinite precision, as such the error between the actual mean and the value subtracted will be the new mean. – João Almeida Jun 03 '17 at 17:56

Mean of data scaled with sklearn StandardScaler is not zero

1 Answers1

Linked