Do I need to normalize (or scale) data for randomForest (R package)?

Question

I am doing regression task - do I need to normalize (or scale) data for randomForest (R package)? And is it neccessary to scale also target values? And if - I want to use scale function from caret package, but I did not find how to get data back (descale, denormalize). Do not you know about some other function (in any package) which is helpfull with normalization/denormalization? Thanks, Milan

The `scale` function does not belong to `caret`. It is part of the "base" R package. There is an `unscale` function that will reverse the transformation. — IRTFM, Jan 22 '12 at 14:11
I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). — desertnaut, Aug 05 '21 at 16:07
It's always weird when SE closes questions having 93 upvotes and 39 favorites. — Dr_Zaszuś, Feb 04 '22 at 09:53

Hong Ooi · Answer 1 · 2012-01-22T17:21:50.197

111

No, scaling is not necessary for random forests.

The nature of RF is such that convergence and numerical precision issues, which can sometimes trip up the algorithms used in logistic and linear regression, as well as neural networks, aren't so important. Because of this, you don't need to transform variables to a common scale like you might with a NN.
You're don't get any analogue of a regression coefficient, which measures the relationship between each predictor variable and the response. Because of this, you also don't need to consider how to interpret such coefficients which is something that is affected by variable measurement scales.

edited Jan 22 '12 at 17:21

answered Jan 22 '12 at 17:02

Hong Ooi

56,353
13
134
187

40

Not only is scaling not necessary, it can smooth out the nonlinear nature of the model. If you have complex nonlinear relationships in p-dimensional space and you have transformed your data, when you back-transform y these nonlinearities are not reflected in the estimate. – Jeffrey Evans Oct 22 '12 at 20:21
20

@JeffreyEvans please please please combine your great comments and post them as an answer. Otherwise this will just slip under everyone's radar. You are saying **"No, not only is it not necessary, it is harmful for the following reasons a) b) c) ..."** – smci Jul 22 '15 at 05:25
2

I think he means that it is not necessary but will not harm if you scale all sets (train,test) with the same function defined by the training set. – Keith Sep 28 '17 at 18:37
Guess, what will happen, in the following example, if you have 20 predictive features, 15 of them are in [0;10] range and the other 5 – Danylo Zherebetskyy Nov 23 '17 at 00:44
Doesn’t it depend - if the scale is largely different between variables then won’t scaled features potentially enable shorter trees? If min max norm is used over vector norm then won’t the topology of the network be different too ? – user3546025 Jan 14 '20 at 00:14

score 34 · Answer 2 · answered Jul 01 '18 at 20:08

Scaling is done to Normalize data so that priority is not given to a particular feature. Role of Scaling is mostly important in algorithms that are distance based and require Euclidean Distance.

Random Forest is a tree-based model and hence does not require feature scaling.

This algorithm requires partitioning, even if you apply Normalization then also> the result would be the same.

score 4 · Answer 3 · edited Apr 13 '17 at 12:44

4

I do not see any suggestions in either the help page or the Vignette that suggests scaling is necessary for a regression variable in randomForest. This example at Stats Exchange does not use scaling either.

Copy of my comment: The scale function does not belong to pkg:caret. It is part of the "base" R package. There is an unscale function in packages grt and DMwR that will reverse the transformation, or you could simply multiply by the scale attribute and then add the center attribute values.

Your conception of why "normalization" needs to be done may require critical examination. The test of non-normality is only needed after the regressions are done and may not be needed at all if there are no assumptions of normality in the goodness of fit methodology. So: Why are you asking? Searching in SO and Stats.Exchange might prove useful: citation #1 ; citation #2 ; citation #3

The boxcox function is a commonly used tranformation when one does not have prior knowledge of twhat a distribution "should" be and when you really need to do a tranformation. There are many pitfalls in applying transformations, so the fact that you need to ask the question raises concerns that you may be in need of further consultations or self-study.

edited Apr 13 '17 at 12:44

Community

1
1

answered Jan 22 '12 at 14:19

IRTFM

258,963
21
364
487

I understand normalization in my question as simple linear transformation of data to e.g. interval 0-1. This should be done e.g. when using neural networks. So what I needed when I asked was answered by Hong Ooi. I did not find function unscale which you suggested. But thanks for your effort. – gutompf Jan 22 '12 at 18:36
Added citations to answer your second question. – IRTFM Jan 22 '12 at 18:54
I appologise - I overlooked that unscale is packages grt and DMwR – gutompf Jan 22 '12 at 20:08
No apology needed. I had manufactured a "false memory" that it was in "base" and that it was mentioned on the help page for `scale`. Your followup question was helpful in setting the record straight. – IRTFM Jan 22 '12 at 20:12
2

@BondedDust: great answer but the last paragraph comes off kind of nasty. Maybe rephrase *"You need to learn when you do and don't need to do a transformation, both on predictors and response variable"* – smci Jul 22 '15 at 05:45

Danylo Zherebetskyy · Answer 4 · 2017-11-23T01:36:20.123

2

Guess, what will happen in the following example? Imagine, you have 20 predictive features, 18 of them are in [0;10] range and the other 2 in [0;1,000,000] range (taken from a real-life example). Question1: what feature importances will Random Forest assign. Question2: what will happen to the feature importance after scaling the 2 large-range features?

Scaling is important. It is that Random Forest is less sensitive to the scaling then other algorithms and can work with "roughly"-scaled features.

edited Nov 23 '17 at 01:36

answered Nov 23 '17 at 01:00

Danylo Zherebetskyy

1,429
13
9

1

Random Forests do not need scaling – Patrick Stetz Jun 14 '18 at 16:42
9

If just predictions are required then common sense is that scaling is not required (Decision Trees are invariant to linear transformations). However, if "feature importance" or "feature selection" or "feature etc." are under consideration then scaled vs unscaled data will give different "feature"-related results. See for example: 1) Strobl et al "Bias in random forest variable importance measures: Illustrations, sources and a solution", BMC Bioinformatics, 2007; 2) http://explained.ai/rf-importance/index.html – Danylo Zherebetskyy Jul 12 '18 at 00:57
Old answer, but: this is wrong, and the provided link says nothing about scaling the features. The only mention of scaling is in the _importance measure_, which is entirely different – Hong Ooi Dec 12 '20 at 01:55

score 1 · Answer 5 · answered Apr 11 '12 at 19:40

1

If you are going to add interactions to dataset - that is, new variable being some function of other variables (usually simple multiplication), and you dont feel what that new variable stands for (cant interprete it), then you should calculate this variable using scaled variables.

answered Apr 11 '12 at 19:40

Qbik

5,885
14
62
93

7

Random Forests is a nonlinear model and the nature of the node splitting statistic accounts for high dimensional interactions. As such, it is unnecessary and quite undesirable to attempt to define interaction variables. – Jeffrey Evans Oct 22 '12 at 20:17

score 1 · Answer 6 · answered Oct 07 '18 at 06:56

Random Forest uses information gain / gini coefficient inherently which will not be affected by scaling unlike many other machine learning models which will (such as k-means clustering, PCA etc). However, it might 'arguably' fasten the convergence as hinted in other answers

Do I need to normalize (or scale) data for randomForest (R package)?

6 Answers6

Linked