0

The data I am currently using varies between 0.5 and 1.0 with a bunch of values around 0.5-0.6 and then a few values above. I am then using random forest as a classifier and i was wondering what would be the best way to normalize these values? Or is there no need to normalize?

Currently I just use, am I missing a trick?

RandomForestClassifier(random_state=42)
Peter
  • 10,492
  • 21
  • 82
  • 132
  • Please be specific on your problem. Also you may take a look at the scikit-learn [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier) on the default parameters for RandomForestClassifier – chrisckwong821 Sep 09 '17 at 08:15
  • First of all, you can make a plot whose axis is the data range (say 0.5 - 1.0 in your case), plot data points in one class at y =0 and the points in the other class at y=1. Just to learn the overall distribution of them. – nos Sep 09 '17 at 18:38

1 Answers1

0

Random Forest is invariant to monotonic transformations of individual features. Translations or per feature scalings will not change anything for the Random Forest.

No, scaling or normalization is not necessary for random forests.

  • The nature of RF is such that convergence and numerical precision issues, which can sometimes trip up the algorithms used in logistic and linear regression, as well as neural networks, aren't so important. Because of this, you don't need to transform variables to a common scale like you might with a NN.
  • You're don't get any analogue of a regression coefficient, which measures the relationship between each predictor variable and the response. Because of this, you also don't need to consider how to interpret such coefficients which is something that is affected by variable measurement scales.

Reference:

Do I need to normalize (or scale) data for randomForest (R package)?

Tushar Gupta
  • 1,603
  • 13
  • 20