I have a dataset with 277 observations.I have binary response variables i.e, 0 indicates no disease, and 1 indicates disease. I know that 180 of the observations have no disease and the 97 have the disease. I build a model and construct a classification tree to see how well my model correctly predicts who has the disease and who doesn't. I used the rpart
function to construct a tree, and ran a summary on it.
mytree=rpart(y~x1+x2+x3+x4, method="class")
summary(tree)
My question is, how do I know which % of the data is classified correctly at each tip? Suppose my output is as follows:
Node number 1: 277 observations, complexity param=0.134
predicted class=0 expected loss=0.35 P(node) =1
class counts: 180 97
probabilities: 0.650 0.350
left son=2 (156 obs) right son=3 (121 obs)
Primary splits:
x1 < 1.73 to the left, improve=17.80, (0 missing)
x3 < 1.44 to the left, improve=17.80, (0 missing)
x2 < 1.35 to the left, improve=16.40, (0 missing)
x4 < 3.5 to the left, improve= 1.36, (0 missing)
Surrogate splits:
x2 < 1.35 to the left, agree=0.751, adj=0.430, (0 split)
x3 < 1.44 to the left, agree=0.653, adj=0.207, (0 split)
x4 < 3.5 to the right, agree=0.578, adj=0.033, (0 split)
Node number 2: 156 observations, complexity param=0.0258
predicted class=0 expected loss=0.192 P(node) =0.563
class counts: 126 30
probabilities: 0.808 0.192
left son=4 (133 obs) right son=5 (23 obs)
Primary splits:
x3 < 1.6 to the left, improve=4.410, (0 missing)
x2 < 1.83 to the left, improve=3.990, (0 missing)
x1 < 1.27 to the left, improve=1.410, (0 missing)
x4 < 4.5 to the left, improve=0.999, (0 missing)
Node number 4: 133 observations
predicted class=0 expected loss=0.143 P(node) =0.48
class counts: 114 19
probabilities: 0.857 0.143
Note that node number 4 splits into two tips. One of the tips has 114 observations (and this is a terminal tip). It classified 114 of the 133 observations as 0. Now, how can I tell how many of the 114 is CORRECTLY classified as 0? Any insight will be greatly appreciated.