Figuring out why scikit-learn DecisionTreeClassifier decided to exclude a feature from the resulting decision tree?

Question

I'm using scikit-learn's DecisionTreeClassifier to construct a decision tree for a particular feature-set. To my surprise, one feature which was thought to be significant - was excluded.

Is there a way to take a peek under the hood, and figure out why the algorithm chose to exclude that feature?

Or really, get more information / analytics about any part of the decision-tree construction process?

How are you saying that it has been excluded? Are you looking at the underlying `tree_`? Maybe check the `feature_importance_` of the fitted tree. — Vivek Kumar, Aug 09 '17 at 01:14
@VivekKumar: I used `export_graphviz()` and the feature isn't in the rendered tree. — Dun Peal, Aug 11 '17 at 15:46

score 0 · Answer 1 · answered Aug 09 '17 at 08:47

Regarding your problem with a feature ignoring, its hard to tell why, but I can suggest to "play" with the weights of the sample_weight flag to change the weight each sample get, and therefore give more weight to the mentioned feature, which you can read an excellent explanation here.

Also, for debugging, there is a way to save an image of the trained tree, as demonstrated in the documentation:

The export_graphviz exporter supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. IPython notebooks can also render these plots inline using the Image() function:

from IPython.display import Image
dot_data = tree.export_graphviz(clf, out_file=None,  # clf: the trained classifier
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = pydotplus.graph_from_dot_data(dot_data)  
Image(graph.create_png())

In my opinion, sample_weight is used to assign more weight to a whole sample (not a feature). — Vivek Kumar, Aug 11 '17 at 17:14

Figuring out why scikit-learn DecisionTreeClassifier decided to exclude a feature from the resulting decision tree?

1 Answers1