I am looking to do outlier detection for some system time series data using Isolation Forest. The scales of the features in my case is quite varied. My gut tells me that I should normalize the data, but I don't recall this requirement in the original Iso Forest paper. Any guidance is appreciated.
-
This issue already discussed [here](https://stackoverflow.com/questions/8961586/do-i-need-to-normalize-or-scale-data-for-randomforest-r-package). – Mario Feb 13 '21 at 15:39
2 Answers
I dont think its a good idea to normalise your data for isolation forest. Anomaly detection in general doesn't need normalisation. By definition, outlier/anomaly detection is to identify data points different and fewer from majority of points. Normalising will get all these points within a smaller scale and that can't be good for "difference" we are detecting between the points, the basis of outlier detection itself.
Coming to isolation forest, the variation between scales of features shouldn't matter. If this is your only concern wrt normalisation, you can always set the property "max_features" to 1. Isolation forest is an ensemble decision tree algorithm, Max_features is the maximum number of features to pick for training each tree/ base estimator. If you set this as 1 (it's always 1 by default) there'll be only one feature involved with each tree, so the difference in scale would never matter.

- 810
- 12
- 26
-
4Agreed on IsolationForest, don't normalize the data. But for many distance based methods, like KMeans or DBSCAN normalization is important - otherwise there feature scaling will create an arbitrary implicit weighting of feature importances - rarely what is wantsd – Jon Nordby Oct 16 '20 at 21:21
Normalization is unnecessary. However it will neither hurt nor help (assuming linear scaling).
Isolation forests work by splitting the data on a random feature at a random point between min and max. So if the scale is 0-1 or 0-100000 it would not make a difference since it will still take the same amount of splits to create the tree.

- 45
- 1
- 8