0

I fitted a decision tree to a dataset having 20 inputs and 1 categorical output using the following Python Code (wordsDatum is just an array containing inputs in columns 0 to 19 and the output in column 20

clsfr=tree.DecisionTreeClassifier(max_depth=2,min_samples_leaf=50)
clsfr=clsfr.fit(wordsDatum[:,0:19],wordsDatum[:,20])
for items in clsfr.feature_importances_:
    print items

When I print the feature importances, I only get 19 values - this is strange considering I have 20 features. Any ideas what might be going on here?

Thanks for your help!

DataAlgo
  • 1
  • 1

2 Answers2

1

This is due to how lists are defined in python. You can find some good insights on this here.

But in summary, if you define a list like this:

my_list = [0, 1, 2, 3, 4, 5]

and you call my_list[0:5], it will give you:

[0, 1, 2, 3, 4]

So if you change the second line of your code to:

clsfr=clsfr.fit(wordsDatum[:,0:20],wordsDatum[:,20])

It will do what you expect of it. It will include the first twenty features.

Community
  • 1
  • 1
oxtay
  • 3,990
  • 6
  • 30
  • 43
0

Thanks for your response! Yes, python seems to have this quirk (?) of including the lower limit but excluding the upper limit of the range