10

I have trained a random forest model using scikit-learn and now I want to save its tree structures in a text file so I can use it elsewhere. According to this link a tree object consist of a number of parallel arrays each one hold some information about different nodes of the tree (ex. left child, right child, what feature it examines,...) . However there seems to be no information about the class label corresponding to each leaf node! It's even not mentioned in the examples provided in the link above.

Does anyone know where are the class labels stored in the scikit-learn decision tree structure?

blacksite
  • 12,086
  • 10
  • 64
  • 109
whoAmI
  • 358
  • 4
  • 16

1 Answers1

7

Take a look at the docs for sklearn.tree.DecisionTreeClassifier.tree_.value:

from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()

clf.fit(iris.data, iris.target)

print(clf.classes_)

[0, 1, 2]

print(clf.tree_.value)

[[[ 50.  50.  50.]]

 [[ 50.   0.   0.]]

 [[  0.  50.  50.]]

 [[  0.  49.   5.]]

 [[  0.  47.   1.]]

 [[  0.  47.   0.]]

 [[  0.   0.   1.]]

 [[  0.   2.   4.]]

 [[  0.   0.   3.]]

 [[  0.   2.   1.]]

 [[  0.   2.   0.]]

 [[  0.   0.   1.]]

 [[  0.   1.  45.]]

 [[  0.   1.   2.]]

 [[  0.   0.   2.]]

 [[  0.   1.   0.]]

 [[  0.   0.  43.]]]

Each row in clf.tree_.value "contains the constant prediction value of each node," (help(clf.tree_)) which corresponds index-to-index to clf.classes_.

See this answer for (barely) more details.

blacksite
  • 12,086
  • 10
  • 64
  • 109
  • 7
    Adding to the answer, for each row in this array, you can do `clf.classes_[np.argmax(value)]` to get the predicted class label. – Vivek Kumar May 24 '17 at 13:21
  • @not_a_robot Thanks. you explained it perfectly. However I still can't find where clf.tree_.value is mentioned in the documentation. I guess I don't need it anymore since your answer is exactly what I was looking for. – whoAmI May 27 '17 at 05:07
  • 1
    Just another quick question. Looks like clf.classes_ gives me labels of [0,...,n-1], regardless of what labels I use. Am I right? I was expecting [1,...,n] In my case. – whoAmI May 27 '17 at 06:07
  • 1
    I believe the labels are zero-indexed, which is why it's [0, *n*-1]. – blacksite May 27 '17 at 12:55