0

I am trying to normalize my features (Xtrain matrix and it's a 250 by 7) , and this is what I've done for that:

mean_tr = np.mean(Xtrain, axis=0)
sd_tr = np.std(Xtrain, axis=0)
feature1 = (Xtrain[:,0] - mean_tr[0]) / sd_tr[0]
feature2 = (Xtrain[:,1] - mean_tr[1]) / sd_tr[1]
feature3 = (Xtrain[:,2] - mean_tr[2]) / sd_tr[2]
feature4 = (Xtrain[:,3] - mean_tr[3]) / sd_tr[3]
feature5 = (Xtrain[:,4] - mean_tr[4]) / sd_tr[4]
feature6 = (Xtrain[:,5] - mean_tr[5]) / sd_tr[5]
feature7 = (Xtrain[:,6] - mean_tr[6]) / sd_tr[6]

But something's wrong! my features are not within 0-1 range! some are more than 1! What am I doing wrong?

prosoitos
  • 6,679
  • 5
  • 27
  • 41
Azarang
  • 11
  • 1
  • You're not doing anything wrong. Scaling by the std doesn't fix the range to [0,1] it scales it so the variance is 1. You want a min-max scaler where you subtract the min and divide by the max. Details at https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-scaler – BenI Sep 26 '20 at 19:13
  • See the answers of this question for other possible approaches: https://stackoverflow.com/questions/21030391/how-to-normalize-an-array-in-numpy – G. Sliepen Sep 27 '20 at 09:27

1 Answers1

2

You are doing standardization, i.e. computing Z-scores, which have zero mean and unit variance.

For normalization, you have to use another formula:

(X - X_min) / (X_max - X_min)

Read more about it here.

gtancev
  • 243
  • 1
  • 10