I have a purely categorical dataframe from the UCI machine learning database https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008
I am using rpart to form a decision tree based on a new category on whether patients return before 30 days (a new failed category).
I am using the following parameters for my decision tree
tree_model <- rpart(Failed ~ race + gender + age+ time_in_hospital+ medical_specialty + num_lab_procedures+ num_procedures+num_medications+number_outpatient+number_emergency+number_inpatient+number_diagnoses+max_glu_serum+ A1Cresult+metformin+glimepiride+glipizide+glyburide+pioglitazone+rosiglitazone+insulin+change,method="class", data=training_data, control=rpart.control(minsplit=2, cp=0.0001, maxdepth=20, xval = 10), parms = list(split = "gini"))
Printing the results yields:
CP nsplit rel error xerror xstd
1 0.00065883 0 1.00000 1.0000 0.018518
2 0.00057648 8 0.99424 1.0038 0.018549
3 0.00025621 10 0.99308 1.0031 0.018543
4 0.00020000 13 0.99231 1.0031 0.018543
I see that the relative error is going down as the decision tree branches off, but the xerror goes up - which I don't understand as I would have thought that the error would reduce the more branches there are and the more complex the tree is.
I take it that the xerror is most important, since most methods for tree pruning would cut the tree at the root.
Why is the xerror what is focused on when pruning the tree? And when we summarise what the error of the decision tree classifier is, is the error 0.99231 or 1.0031?