0

Why in some cases random forest with n_estimators equals to 1 performs worse than decision tree, even after setting the bootstrap to false?

Try to use different machine learning model for predicting credit card default rate, I tried random forest and decision tree, but random forest seems to perform worse, then I tried random forest with only 1 tree, so it is supposed to be the same as decision tree, but it still performed worse.

Jacky
  • 1
  • 1

1 Answers1

0

A specific answer to your observations depends on the implementation of the decision tree (DT) and random forest (RF) methods that you're using. That said, there are three most likely reasons:

  1. bootstrapping: Although you mention that you set that to False, in the most general form, RFs use two forms of bootstrapping: of the dataset and of the features. Perhaps the setting only controls one of these. Even if both of these are off, some RF implementations have other parameters that control the number of attributes considered for each split of the tree and how they are selected.

  2. tree hyperparameters: Related to my remark on the previous point, the other aspect to check is if all of the other tree hyperparameters are the same. Tree depth, number of points per leaf node, etc, these all would have to matched to make the methods directly comparable.

  3. growing method: Lastly, it is important to remember that trees are learned via indirect/heuristic losses that are often greedily optimized. Accordingly, there are different algorithms to grow the trees (e.g., C4.5), and the DT and RF implementation may be using different approaches.

If all of these match, then the differences should really be minor. If there are still differences (i.e., "in some cases"), these may be because of randomness in initialization and the greedy learning schemes which lead to suboptimal trees. That is the main reason for RFs, in which the ensemble diversity is used to mitigate these issues.

ATony
  • 683
  • 2
  • 12