0

I'm using scikit-learn's DecisionTreeClassifier to construct a decision tree for a particular feature-set. To my surprise, one feature which was thought to be significant - was excluded.

Is there a way to take a peek under the hood, and figure out why the algorithm chose to exclude that feature?

Or really, get more information / analytics about any part of the decision-tree construction process?

Dun Peal
  • 16,679
  • 11
  • 33
  • 46
  • How are you saying that it has been excluded? Are you looking at the underlying `tree_`? Maybe check the `feature_importance_` of the fitted tree. – Vivek Kumar Aug 09 '17 at 01:14
  • @VivekKumar: I used `export_graphviz()` and the feature isn't in the rendered tree. – Dun Peal Aug 11 '17 at 15:46

1 Answers1

0

Regarding your problem with a feature ignoring, its hard to tell why, but I can suggest to "play" with the weights of the sample_weight flag to change the weight each sample get, and therefore give more weight to the mentioned feature, which you can read an excellent explanation here.

Also, for debugging, there is a way to save an image of the trained tree, as demonstrated in the documentation:

The export_graphviz exporter supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. IPython notebooks can also render these plots inline using the Image() function:

from IPython.display import Image
dot_data = tree.export_graphviz(clf, out_file=None,  # clf: the trained classifier
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = pydotplus.graph_from_dot_data(dot_data)  
Image(graph.create_png())

enter image description here

Gal Dreiman
  • 3,969
  • 2
  • 21
  • 40