1

I have simple Decision Tree program in Python. Is there a way to find out (print) the most influential parameter (or more of them) from X that caused the result? For example: "the predicted result is "yes". The most influential parameters are [0] values in items of X"

from sklearn import tree


X=[[100,3],[130,3],[80,2],[90,2],[140,3]]
Y=["yes","no","yes","yes","no"]

clf = tree.DecisionTreeClassifier()

clf = clf.fit(X,Y)

List1=[124,3]

prediction = clf.predict([List1])

print(prediction)
Aleksandar Beat
  • 191
  • 7
  • 22

1 Answers1

1

feature_importances_ attribute can be used.

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature.

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

print(clf.feature_importances_)
> [1. 0.]

Here second feature importance is zero, it means this feature is not included in the rule tree.

Seljuk Gulcan
  • 1,826
  • 13
  • 24
  • But what is 1. and 0. in [1. 0.]. I tried to predict the outcome for 4-5 different values and I always get the same result [1.0.] – Aleksandar Beat Mar 13 '18 at 07:27
  • @AleksandarBeat Each value in the list is sum of weigted importance gain in nodes which that feature is used to split. In small dataset, it might not be so obvious. I'll try to add an example. You may draw decision tree to see what these values actually mean. Look at this question : https://stackoverflow.com/questions/49170296/scikit-learn-feature-importance-calculation-in-decision-trees/49171133#49171133 and let me know if it is not clear. – Seljuk Gulcan Mar 13 '18 at 07:34