2

I am building decision tree in scikit-learn. Searching stackoverflow one can find a way to extract rules associated with each leaf. Now my goal is to apply these rules to new observation and see in what leaf new observation will end up.

Here is an abstract example. Suppose we got rule for leaf #1. a<5 and b>7, then observation belong to leaf #1. Now I would like to take new observation and apply these rules to it to check in what leaf it ends up.

I am trying to use decision tree for the purpose of segmentation.

tfv
  • 6,016
  • 4
  • 36
  • 67
user1700890
  • 7,144
  • 18
  • 87
  • 183
  • The documentation in official site is not sufficient? http://scikit-learn.org/stable/modules/tree.html – Patrick the Cat Sep 26 '16 at 14:34
  • @Mai Visualization with graphviz gives you an idea about rules, but one needs to manually program these rules to order to use it on new observation (to figure out leaf). I am looking to automatically extract a function from decision tree, which would eliminate manual coding. Manual coding is fine when tree is small, but when it is large, it it virtually impossible. – user1700890 Sep 26 '16 at 14:42
  • 1
    How about using the _tree attribute? I think you can look into the source code to see how _tree attribute is used and go from there. I am not quite getting your problem. Perhaps a bit clarification could be helpful. – Patrick the Cat Sep 26 '16 at 14:45
  • @Mai, It makes sense, but I was hoping there is a short-cut. – user1700890 Sep 26 '16 at 14:47
  • I think ML models are generally built to predict future instances. DT are trained and used to predict outcome of future observations. Why do you need to extract rules? – Patrick the Cat Sep 26 '16 at 14:50
  • @Mai, I would like to study properties of leafs to understand segmentation underlying classification decision. I agree this is somewhat non-standard approach to decision tree usage. – user1700890 Sep 26 '16 at 15:09

2 Answers2

2

You can use the apply method of DecisionTreeClassifier to get the index of hte leaf that each sample is predicted as.

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf.fit([[1,2,3],[10,19,20],[6,7,7]],[1,1,0])    
clf.apply([[6,7,7]])
# array([3])
maxymoo
  • 35,286
  • 11
  • 92
  • 119
1

An example for using a decision tree classifier with scikit learn can be found here. This example includes training the classifier and validating the results for a second data set.

The predict function can be used to return the results for a new data sample when applying the trained decision tree to it:

 predict(X, check_input=True)

where X is the feature vector of the new data sample under examination.

This link might help you to understand how to output the rules of your decision tree classifier.

Community
  • 1
  • 1
tfv
  • 6,016
  • 4
  • 36
  • 67