3

Why does the standardization with sklearn.preprocessing.StandardScaler in Python differ to zscore in Matlab?

Example with sklearn.preprocessing in Python:

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> scaler.fit(data)
>>> print(scaler.mean_)
    [ 0.5  0.5]
>>> print(scaler.var_)
    [0.25 0.25]
>>> print(scaler.transform(data))
[[-1. -1.]
[-1. -1.]
[ 1.  1.]
[ 1.  1.]]

The same example in Matlab with zscore function:

>> data = [[0, 0]; [0, 0]; [1, 1]; [1, 1]];
>> [Sd_data,mean,stdev] = zscore(data)

    Sd_data =
   -0.8660   -0.8660
   -0.8660   -0.8660
    0.8660    0.8660
    0.8660    0.8660

    mean =
    0.5000    0.5000

    stdev =
    0.5774    0.5774    
  • Related: https://stackoverflow.com/questions/27600207/why-does-numpy-std-give-a-different-result-to-matlab-std – rayryeng Mar 07 '18 at 20:14

1 Answers1

4

It appears the issue lies with the degrees of freedom (ddof - the correction factor associated with the estimate of the standard deviation) which appears to be 0 by default with the StandardScaler.

As an alternative, scipy.stats's zscore function allows you to control this parameter when scaling:

from scipy.stats import zscore

zscore(data, ddof=1)
array([[-0.8660254, -0.8660254],
       [-0.8660254, -0.8660254],
       [ 0.8660254,  0.8660254],
       [ 0.8660254,  0.8660254]])

And you end up getting the same output as the matlab function. When you call zscore with ddof=0, you get the same output as the StandardScaler.

cs95
  • 379,657
  • 97
  • 704
  • 746