0

While viewing this question scikit learn - feature importance calculation in decision trees, I have trouble understanding the value list of the Decision Tree. For example, the top node has value=[1,3]. What exactly are 1 and 3? Does it mean if X[2]<= 0.5, then 1 false, 3 true? If so, the value list is [number of false cases, number of true cases]. If so, what about the value lists of the leaves?

  1. Why do three right leaves have [0,1] and one left leaf has [1,0]?
  2. What does [1,0] or [0,1] mean anyway? One false zero true or zero false one true? But there's no condition on the leaves (like something <=.5). Then what is true what is false?

Your advice is highly appreciated!

Fred Chang
  • 47
  • 1
  • 6

1 Answers1

0

value=[1,3] means that, in this exactly leaf of the tree (before applying the filter x[2] <=0.5), you have:

  • 1 sample of the class 0
  • 3 sample of the class 1

Once you are going down the tree, you are filtering. Your objective is have perfectly separated classes. So you tend to have something like value=[0,1], which means that after applying all filters, you have 0 samples of class 0 and 1 samples of class 1.

You can also check that the sum of value is always similar to the samples. This makes completely sense since value is only telling you how all samples that arrived this leaf are distributed.

Alex Serra Marrugat
  • 1,849
  • 1
  • 4
  • 14