3

I am currently working on random forests under scikit-learn.

Is there a possible way to change the weights of each estimator used in the random forest generated?

NWgs
  • 31
  • 1
  • 2
    Hi NWgs, welcome to StackOverflow! In order to help other from the SO community to answer your question, the best is to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). This way, others can point out what is wrong with your code. See also [this question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to see how to ask a better question. Cheers :) – Christian Feb 10 '20 at 08:33
  • Hi NWgs, is there a reason why you'd want to do this? Depending on your problem, it may be better to get help from the statistics community: https://stats.stackexchange.com – Nathan Feb 10 '20 at 08:44

1 Answers1

0

Are you asking how to change the weights of each estimator individually or how to change the answer weight of each tree in the voting system when doing predict()?

When you have a fitted random forest, the parameter estimators_ returns an array of decision trees, and all of them can be edited individually, f.e:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=2)

df = pd.DataFrame([[1, True], [2, False]])

model.fit(df[0].to_numpy().reshape(-1,1), df[1])

print(model.estimators_)

Outputs:

[DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                    max_depth=None, max_features='auto', max_leaf_nodes=None,
                    min_impurity_decrease=0.0, min_impurity_split=None,
                    min_samples_leaf=1, min_samples_split=2,
                    min_weight_fraction_leaf=0.0, presort='deprecated',
                    random_state=1942352063, splitter='best'),
 DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                        max_depth=None, max_features='auto', max_leaf_nodes=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, presort='deprecated',
                        random_state=1414900336, splitter='best')]

So you can select the first one just using model.estimators_[0].

Then, if you read the Decision tree docu, you can change the feature_importances_.

If your question is how to change the random forest voting system, then I recommend you take a look at the code, but let me tell you that it's not a good decision to modify this feature.

As you can see here, random forest takes for each output the biggest probability (of all trees), so you could work with the prediction probability of each decision tree individually.

Noki
  • 870
  • 10
  • 22
  • Thank you for your detailed response. And yes, I am currently testing something on the weights of the random forests generated to see if it outperforms the built-in generator for the weights or not. I'll get into the code as you suggested then. – NWgs Feb 11 '20 at 10:14
  • I am looking for a python/scikit learn related or similar function as the one in MATLAB: [TreeBagger](https://www.mathworks.com/help/stats/treebagger.html). You can customize the creation of trees on this class. Is [BaggingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html#sklearn.ensemble.BaggingClassifier) similar to this? – NWgs Feb 11 '20 at 11:35