Number of feature_importances_ does not match no of features in Scikit learn's DecisionTreeClassifier

Question

I fitted a decision tree to a dataset having 20 inputs and 1 categorical output using the following Python Code (wordsDatum is just an array containing inputs in columns 0 to 19 and the output in column 20

clsfr=tree.DecisionTreeClassifier(max_depth=2,min_samples_leaf=50)
clsfr=clsfr.fit(wordsDatum[:,0:19],wordsDatum[:,20])
for items in clsfr.feature_importances_:
    print items

When I print the feature importances, I only get 19 values - this is strange considering I have 20 features. Any ideas what might be going on here?

Thanks for your help!

score 1 · Answer 1 · edited May 23 '17 at 11:43

This is due to how lists are defined in python. You can find some good insights on this here.

But in summary, if you define a list like this:

my_list = [0, 1, 2, 3, 4, 5]

and you call my_list[0:5], it will give you:

[0, 1, 2, 3, 4]

So if you change the second line of your code to:

clsfr=clsfr.fit(wordsDatum[:,0:20],wordsDatum[:,20])

It will do what you expect of it. It will include the first twenty features.

score 0 · Answer 2 · answered Jun 23 '15 at 23:31

0

Thanks for your response! Yes, python seems to have this quirk (?) of including the lower limit but excluding the upper limit of the range

answered Jun 23 '15 at 23:31

pythonCodeHelp

1
1

Number of feature_importances_ does not match no of features in Scikit learn's DecisionTreeClassifier

2 Answers2