I want to retrieve the path each instance takes in decision tree or RandomForest. for instance, I need such an output:
# 1 1 3 4 8 NA NA
# 2 1 2 5 7 11 NA
# 3 1 3 4 9 10 13
# 4 1 3 4 8 NA NA
# etc
It means that instance #1 passes the path from node 1, 3, 4 and ended in terminal node 8 and so forth. It is obvious that the path length of some instances is shorter than others.
I used decision_path
but it gives a sparse matrix which I can not understand and find such a path. Even I cannot read the output. It is the sample code for Iris
database:
from sklearn.datasets import load_iris
iris = load_iris()
import numpy as np
ytrain = iris.target
xtrain = iris.data
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
fitted_tree = dtree.fit(X=xtrain,y=ytrain)
predictiontree = dtree.predict(xtrain)
fitted_tree.decision_path(xtrain)
The output is this:
<150x17 sparse matrix of type '<class 'numpy.int64'>'
with 560 stored elements in Compressed Sparse Row format>
Please help me make the matrix such as the one I mentioned at the top. I have no idea how to handle sparse matrix.