3

I'm struggling with understanding output of tree classification in rpart. I don't understand how 'root node error' is calculated(one of the output of printcp function). I couldn't find it definition also in rpart package description.

On example I loaded titanic data:

library(titanic)
library(rpart)

tt<-titanic_train
table(tt$Survived)

So we have 549 people who survived and 342 people who died. Total 891 people.

fit<-rpart(Survived ~Pclass+Sex+Age+ SibSp+Parch+Fare+Embarked , data=tt)
printcp(dend) 

Gives result:

Regression tree:
rpart(formula = Survived ~ Pclass + Sex + Age + SibSp + Parch + 
    Fare + Embarked, data = tt)

Variables actually used in tree construction:
[1] Age    Fare   Pclass Sex    SibSp 

Root node error: 210.73/891 = 0.23651

n= 891 

        CP nsplit rel error  xerror     xstd
1 0.295231      0   1.00000 1.00538 0.016124
2 0.073942      1   0.70477 0.70896 0.033228
3 0.027124      2   0.63083 0.63570 0.031752
4 0.026299      3   0.60370 0.62105 0.032815
5 0.023849      4   0.57740 0.61154 0.032884
6 0.021091      5   0.55356 0.58294 0.032127
7 0.010000      6   0.53246 0.57097 0.032402

Here root node error mean misclassification error at the beginning before adding any nodes, am I right? So if I assume that everyone survived I will be wrong in 342 cases out of 891, so root node error should be 342/891. And in the output I have 210.73/891.

I would be grateful with helping me understand what 210.73 means in Root node error and how it was calculated on example this titanic data. I was searching for it all day and can't find any explanation.

Thank you in advance for help.

michalk
  • 1,487
  • 3
  • 16
  • 21
  • Possible duplicate of [How to compute error rate from a decision tree?](http://stackoverflow.com/questions/9666212/how-to-compute-error-rate-from-a-decision-tree) ... Please search Stack Overflow before posting. – Tim Biegeleisen Feb 25 '16 at 12:13
  • 1
    I saw that post. There is no explanation what is and how is calculated root node error in this post. There is only mentioned what for we use it (as an input to calculate resubstitution error rate and cross-validated error rate). – michalk Feb 25 '16 at 12:17

1 Answers1

1

Root node error is the percent of correctly sorted records at the first (root) splitting node.

For more information see Understanding the Outputs of the Decision Tree Tool.

Sarah Grogan
  • 117
  • 1
  • 9