To perform a binary prediction, I have 5 features which I want to use for my random forest classifier and two are them are not being utilized at all. I understand that this is the whole point of Machine Learning to select the useful features only, but the other three features might have biased data and I want to make sure that all my features are being used at equal weightage to run my classifier. I can't find a straight forward answer to this question. I use sklearn
in python
for this work. Any comments/suggestions would be greatly appreciated.

- 57,590
- 26
- 140
- 166

- 389
- 1
- 6
- 22
-
You could try running the analysis with only the 3/5 features that aren't being used and compare their predictive power to the run with all 5 - if you get very low accuracy there they probably are just not useful predictors. – katardin Mar 25 '20 at 20:00
-
Thanks @katardin, but that is what I want to avoid because of the bias. I know for a fact that these two are discriminators and it is just that the training sample for the other 3 features is highly likely biased. That is why, I am looking for a way to force include everything. – akaur Mar 25 '20 at 20:09
-
If you think you have a kooky training sample you can call a new random training sample. You already are including everything in your analysis from what i can tell, its just giving you results you don't like. Within limited depth the random tree classifiers don't have to end up using all of the features. – katardin Mar 25 '20 at 20:13
2 Answers
You can request for all features being considered in every split in a Random Forest classifier by setting max_features = None
.
From the docs:
max_features : int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
If int, then consider
max_features
features at each split.If float, then
max_features
is a fraction andint(max_features * n_features)
features are considered at each split.If “auto”, then
max_features=sqrt(n_features)
.If “sqrt”, then
max_features=sqrt(n_features)
(same as “auto”).If “log2”, then
max_features=log2(n_features)
.If None, then
max_features=n_features
.
The answer in Why is Random Forest with a single tree much better than a Decision Tree classifier? might help in explaining and providing some context.

- 57,590
- 26
- 140
- 166
-
thanks @desertnaut. I have looked at it before and tried using it based on its definition, but in the end, I get "0" for two my features when I check their importance in the classifier. I am very confused about it. – akaur Mar 25 '20 at 21:21
-
@Phyast10 the post answered exactly what you asked for in your question. If you had tried this already, you should have included this info in your question (there's a reason why we ask for code, and not verbal descriptions). Given this extra information, your assumption is **wrong**: zero importance doesn't mean the algo has not *considered* these features; it means that it has, but it has found them uninformative. Under these circumstances, insisting that the model has to use all features with equal weight is arbitrary, unfounded, and not at all how ML works. – desertnaut Mar 26 '20 at 13:54
-
@Phyast10 in fact, the standard RF algorithm with the default setting for `max_features` is very well equipped for such cases, ensuring that, for some splits at least, some of the more informative features will be *excluded* from selection, leaving room for the others to be considered. – desertnaut Mar 26 '20 at 13:55
What it can help you is setting the parameter max_feature = 1, so each node will take a (uniform distributed) random feature, and it will be forced to use it. Nevertheless, you need to set the depth of the tree too, because it will be infinitely adding nods till receiving one of the main features.

- 11
- 1
-
Thanks for your feedback. I realized that the values associated with those features were very small and they were all being seen as "0s". This is why those two features were not included. Now I have scaled them accordingly, although Random Forest should not have this issue. – akaur Jul 31 '20 at 16:32