4

Is there a reason not to standardize all features by default? I realize it may not be necessary for e.g., decision trees but for certain algorithms such as KNN, SVM and K-Means. Would there be any harm just routinely to do this for all of my features?

Also, it seems the consensus that standardization is preferable to normalization? When would this not be a good idea?

Levon
  • 138,105
  • 33
  • 200
  • 191

1 Answers1

5

Standardization and normalization, in my experience, have the most (positive) impact when your dataset consists of features that have very different ranges (for instance age vs number of dolars per house)

In my professional experience, while working on a project with sensors from the car (time-series), I noticed that normalization (min-max scaling), even though when applied in case of a neural network, had a negative impact upon the training process and of course the final results. Admittedly, were the sensor features(values) very close as values to one another. It was a very interesting result to remark considering that I was working with Time-Series, where most of the data scientists resort to scaling by default (they are neural network in the end, goes along the theory).

In principle, standardization is better to be applied when it comes to having specific outliers in the dataset, since normalization generates smaller standard deviation values. In my humble knowledge this is the main reason standardization tends to be favored over normalization, its robustness over outliers.

Three years ago, if someone asked me this question, I would have said "standardization" is the way to go. Now I say, follow the principles, but test every hypothesis prior to jumping to a certain conclusion.

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59