2

I am trying to calculate normalized scores for my dataset using mean normalization. When I write (X - np.mean(X))/np.std(X), it gives me different score than doing ((X - X.mean())/X.std().

Problem seems to be coming from calculation of standard deviation. X.std() returns one values for standard deviation and np.std() returns different values for standardization. Why is this happening?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Matt
  • 79
  • 9

1 Answers1

5

Pandas uses the unbiased estimator (N-1 in the denominator), whereas Numpy by default does not.

To make them behave the same, pass ddof=1 to numpy.std().

Different std in pandas vs numpy