Decision Trees - How to interpret decision tree node splitting on graphed data

Question

I have a general question that's probably not fit for Stack Overflow. Apologies in advance.

In all the online articles they display these graphs. I understand Gini is used in entropy. How are values in the first line after <= generated?

The first decision node says petal length (cm) <= 2.45. I understand its literal meaning. I don't understand how it's derived. Petal lengths less than or equal to 2.45 seem like an arbitrary value. And doesn't make sense when the following false path decision node is petal length less than or equal to 1.75.

Extra credit: a good explanation of samples, value and class

Thanks!

Source: https://medium.com/geekculture/criterion-used-in-constructing-decision-tree-c89b7339600f

I think you are looking for https://ai.stackexchange.com/questions/tagged/machine-learning — RobertoT, May 26 '22 at 15:54
However a fast answer, a decision tree algorithm is a mathematic algorithm which is fitted to discover "values" that act as boundaries to divide samples of your data among target classes. In this case, the tree is saying if the petal length is longer than 2.45 cm is setosa, if it is wider than 1.75 it is virginica. If not, it is versicolor — RobertoT, May 26 '22 at 15:56
The right branch doesn't look at petal length it looks at petal *width* — 0x263A, May 26 '22 at 15:58
You may want to look at https://stackoverflow.com/questions/40889344/decision-tree-using-continuous-variable — ThSorn, May 27 '22 at 18:39

score 0 · Answer 1 · answered May 26 '22 at 20:24

Decision trees have two type of nodes: leaf nodes and branch nodes. Branch nodes contain the splitting condition, leaf nodes give you the result of your classification/regression.

gini is used in entropy statement is incorrect. Gini and entropy are metrics, they are used to measure the information gain when performing a split on a condition. Decision tree then leaves the split that resulted in highest information gain.

classes is the label of your data points (in this case there are 3 labels for 3 different flower species).

samples indicates how many data points ended up in that node.

value is a vector which represents how the data points are distributed in terms of their classes. For instance, [0, 49, 5] means there are 49 data points with label 1, 5 data points with label 5.

Decision Trees - How to interpret decision tree node splitting on graphed data

1 Answers1