8

I am using a scikit-learn DecissionTreeClassifier on a 3 class dataset. After I fit the classifier I access all leaf nodes on the tree_ attribute in order to get the amount of instances that end up in a given node for each class.

clf = tree.DecisionTreeClassifier(max_depth=5)
clf.fit(X, y)
# lets assume there is a leaf node with id 5
print clf.tree_.value[5]

This will print out:

>>> array([[  0.,   1.,  68.]])

but ... how do I know which position in that array belongs to which class ? The classifier has a classes_ attribute which is also a list

>>> clf.classes_
array(['CLASS_1', 'CLASS_2', 'CLASS_3'], dtype=object)

Maybe index 1 on the value array matches the class on index 1 of the classes array and so on?

nemi
  • 183
  • 1
  • 6
  • Please post an answer separately instead of editing it into the question. Then you can accept your own answer to mark the question as closed. – Fred Foo Oct 08 '14 at 09:56
  • @larsmans , is that the common rule ? I once read a post where someone did that and got a comment saying that he should do what I did. Your reputation seems high enough tho. I'll do that and hope no one says to do the contrary :S – nemi Oct 09 '14 at 12:23

2 Answers2

9

Asked about this on the scikit-learm mailing list and my guess was right. Turns out the index 1 on the value array matches the class on index 1 of the classes array and so on

nemi
  • 183
  • 1
  • 6
0

No, it is not clf.classes_ but clf.tree_.feature that contain the column index of X. And, if X is Pandas DataFrame, X.columns contains the name. You can find more detailed information in a similar question.

Community
  • 1
  • 1
Jihun
  • 1,415
  • 1
  • 12
  • 16
  • Hi @jihun, I guess u misunderstood, the value array (clf.tree_.value[5]) does not contain features, it contains the counters for the amount of instances for a given class in that node (in this case node 5), what I need is to map those counters to the corresponding class name. – nemi Oct 06 '14 at 19:46