I have created a decision tree using rpart, and I am wondering how to find exactly which cases of the training data are falling into each terminal node.
I followed the answer in this link: How to count the observations falling in each node of a tree but for some reason the $where function is only producing a vector of terminal nodes without the row numbers indicating which case is corresponding to which terminal node. However if I do the exact same thing with a tree made using the tree package, I would get a list of row numbers (identifying each case) with the corresponding terminal node. I noticed that the only difference is that for the rpart package, $where produces a "int" vector while for the tree package, $where produces a "Named int" vector. I am wondering how to produce the same "Named int" vector for a tree made from rpart?
I have also tried the answer suggested in: Find the data elements in a data frame that pass the rule for a node in a tree model? but it does not work for me because rpart deleted 16 observations while creating the model and hence the number of observation in the resulting model does not match the original data frame used to create the model.
Sorry if the answer seems obvious, newbie R user here!
Here is the code I used to create the tree, its a tree used predict diagnosis of autism based on behavioural profiles:
Set.seed(565808016)
inTrain21<- createDataPartition(clinicaldiagnosis, p=0.75, list=FALSE)
training_data21<- Decisiontree4[ inTrain21,]
testing_data21<- Decisiontree4[-inTrain21,]
test_clinicaldiagnosis21<-clinicaldiagnosis[-inTrain21]
lossmatrix=matrix(c(0,1,1,1,0,1,2,1,0), ncol=3, nrow=3)
set.seed(591251974)
tree_model22= rpart(clinicaldiagnosis~ Visualtracking + etc etc, training_data21, na.action=na.rpart, method="class", control=rpart.control(cp=0.00001), parms=list(loss=lossmatrix))
plot(tree_model22, uniform=TRUE, margin=0.05)
text(tree_model22, use.n=TRUE, pretty=0)
plotcp(tree_model22)
printcp(tree_model22)
pruned_model22=prune(tree_model22, cp=0.0146341)
plot(pruned_model22, uniform=TRUE, margin=0.1)
text(pruned_model22, use.n=TRUE, cex=0.85, splits=TRUE, pretty=0)
tree_pred22=predict(pruned_model22, testing_data21, type="class")
table(tree_pred22, test_clinicaldiagnosis21)
trainingnodes22<-rownames(pruned_model22$frame)[pruned_model22$where] #this only gives a list of terminal nodes without the corresponding row names