Suppose I have the following DecisionTreeClassifier
model:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
bunch = load_breast_cancer()
X, y = bunch.data, bunch.target
model = DecisionTreeClassifier(random_state=100)
model.fit(X, y)
I want to traverse each node (both leaf and decision) in this tree and determine how the predicted value changes as the tree is traversed. Basically I'd like to be able to tell, for given sample, how that ultimate prediction (what's returned by .predict
) is determined. So maybe the sample is predicted 1
ultimately, but traverses four nodes and at each node its "constant" (language used in the scikit docs) prediction goes from 1
to 0
to 0
to 1
again.
It's not immediately apparent how I'd get that information from model.tree_.value
, which is described as:
| value : array of double, shape [node_count, n_outputs, max_n_classes]
| Contains the constant prediction value of each node.
And looks like, in the case of this model:
>>> model.tree_.value.shape
(43, 1, 2)
>>> model.tree_.value
array([[[212., 357.]],
[[ 33., 346.]],
[[ 5., 328.]],
[[ 4., 328.]],
[[ 2., 317.]],
[[ 1., 6.]],
[[ 1., 0.]],
[[ 0., 6.]],
[[ 1., 311.]],
[[ 0., 292.]],
[[ 1., 19.]],
[[ 1., 0.]],
[[ 0., 19.]],
Does anyone know how I could accomplish this? Would the class prediction for each of the 43 nodes above just be the argmax of each list? So 1, 1, 1, 1, 1, 1, 0, 0, ..., going from top to bottom above?