33

I'm using scikit-learn to create a Random Forest. However, I want to find the individual depths of each tree. It seems like a simple attribute to have but according to the documentation, (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) there is no way of accessing it.

If this isn't possible, is there a way of accessing the tree depth from a Decision Tree model?

Any help would be appreciated. Thank you.

iltp38
  • 519
  • 2
  • 5
  • 13

1 Answers1

55

Each instance of RandomForestClassifier has an estimators_ attribute, which is a list of DecisionTreeClassifier instances. The documentation shows that an instance of DecisionTreeClassifier has a tree_ attribute, which is an instance of the (undocumented, I believe) Tree class. Some exploration in the interpreter shows that each Tree instance has a max_depth parameter which appears to be what you're looking for -- again, it's undocumented.

In any case, if forest is your instance of RandomForestClassifier, then:

>>> [estimator.tree_.max_depth for estimator in forest.estimators_]
[9, 10, 9, 11, 9, 9, 11, 7, 13, 10]

should do the trick.

Each estimator also has a get_depth() method than can be used to retrieve the same value with briefer syntax:

>>> [estimator.get_depth() for estimator in forest.estimators_]
[9, 10, 9, 11, 9, 9, 11, 7, 13, 10]

To avoid mixup, it should be noted that there is an attribute of each estimator (and not each estimator's tree_) called max depth which returns the setting of the parameter rather than the depth of the actual tree. How estimator.get_depth(), estimator.tree_.max_depth, and estimator.max_depth relate to each other is clarified in the example below:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=3, random_state=4, max_depth=6)
iris = load_iris()
clf.fit(iris['data'], iris['target'])
[(est.get_depth(), est.tree_.max_depth, est.max_depth) for est in clf.estimators_]

Out:

[(6, 6, 6), (3, 3, 6), (4, 4, 6)]

Setting max depth to the default value None would allow the first tree to expand to depth 7 and the output would be:

[(7, 7, None), (3, 3, None), (4, 4, None)]
joelostblom
  • 43,590
  • 17
  • 150
  • 159
jme
  • 19,895
  • 6
  • 41
  • 39
  • Thank you!! This is exactly what I have been looking for. Similarly, do you know if there is a way to manually delete a particular tree from the random forest? I am trying to delete trees with less than a certain depth. – iltp38 Dec 11 '15 at 00:43
  • It *might* be as simple as deleting the estimators from the list. That is, to delete the first tree, `del forest.estimators_[0]`. Or to only keep trees with depth 10 or above: `forest.estimators_ = [e for e in forest.estimators_ if e.tree.max_depth >= 10]`. But it doesn't look like `RandomForestClassifier` was built to work this way, and by modifying `forest.estimators_` you might break things. You can try it out and see if the results seem reasonable, though. If you do, you might want to update `forest.n_estimators = len(forest.estimators_)` for good measure. – jme Dec 11 '15 at 03:38
  • 6
    This answer is incorrect, this tells you the the maximum _allowed_ depth of each tree in the forest, not the actual depth. So for example a random forest trained with `max_depth=10` will return: ```[10, 10, 10, ...]``` – jon_simon Dec 29 '17 at 00:56
  • 1
    It returns whichever is lower of the max_depth argument and the actual depth value. – Ken Fehling Jul 31 '18 at 02:52
  • 1
    See https://datascience.stackexchange.com/questions/19842/anyway-to-know-all-details-of-trees-grown-using-randomforestclassifier-in-scikit/36228#36228 to get the actual max depth for each tree in a forest. – Terence Parr Jul 31 '18 at 19:44
  • @jme - If I understood this correctly, then the maximum depth for the example is 11? By default the `DecisionTreeClassifier` takes `max_depth` if no value is provided. – Chetan Arvind Patil Sep 30 '18 at 01:17
  • @JonathanSimon I believe your comment is inaccurate. If you check the docs of the DecisionTreeClassifier https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier.get_depth, you will see that it has a `get_depth()` method, which "Returns the depth of the decision tree". If you click to view the source of this method, you can see that it returns `self.tree_.max_depth`... – joelostblom Dec 01 '19 at 08:45
  • A quick test with the iris example on the same doc page shows that changing the `max_depth` parameter only influences `tree_.max_depth` if it is lower than the depth the tree would otherwise expand to, as @KenFehling wrote above. – joelostblom Dec 01 '19 at 08:45