In one of my projects, I was trying to determine which of my 12 features are the most driving factors against a target variable using RandomForestRegressor(sklearn). RandomForest nicely gives you a list of feature importances that explains which of the features is best used to explain the target. But I'm still unsure about what should be the max_features for my model because the default answer is use all features which would mean my model is just bagged ensemble of trees. After going through some discussions , it made sense to use n/3 as maximum number of features if you really looking for a random forest of trees. I continued with n/3 as maximum number of features because I was getting pretty good r-square.
Very recently I realized that my feature importances are completely different when I changed the max_features to n. If feature importances are really relative to each other on a scale of 1-10, can it really increment/does it make sense to increment from 0.36 to 0.81 when I change number of features from n/3 to n? So what should be the max_features if I'm trying to determine the most explanatory variable given that I'm getting pretty good r-square with both n/3 and n. I'm unable to figure out what I'm missing.Please suggest how to proceed. Thank you very much.

- 668
- 10
- 15
1 Answers
Yes.
First scenario:
Assume that there are two features feat1
, and feat2
which provide the same type of information to the model. Now if both are present in the data, and the model picks one first, the importance of feat1
will be large. Now the model analyzes the second feature feat2
and concludes that it doesn't provide any significant increase in knowledge than already provided by feat1
. So the importance of feat2
will be relatively small.
Second scenario:
You changed the max_features
to n/3
and somehow feat1
is now not considered. So the information provided by feat2
is now greater than before. So its importance can increase significantly.
Note that this is for a single model. I don't know how it affects the whole ensemble. And maybe you will be able to get more details on https://stats.stackexchange.com.

- 3,201
- 2
- 27
- 34

- 35,217
- 8
- 109
- 132
-
In your first scenario, why is model(every decision tree) picking up feat1 first everytime before feat2? In the second scenario, isn't there a case where feat2 isn't considered and feat1 is explaining the information that feat2 can explain? My question is more of "How are these values really feature importances on a relative scale (given all of them sum to 1) if they keep changing drastically with change in number of features?" – ThReSholD May 02 '18 at 15:59
-
1@ThReSholD The averaging is done at the end. Secondly, the feature_importance is calculated after using all samples, not sample by sample. – Vivek Kumar May 03 '18 at 01:31