Decision tree and error matrix calculations

Question

I've created a decision tree using rpart and the code below:

res.tree <- rpart(myformula, data = credit_train)

my data has been subset into 2 parts. The training part at 70% and a testing part at 30%.

This part works well and my tree is created. Where I'm getting stuck is with the prediction so that I can calculate my confusion matrix and ROC curves.

I'm using this code tree_pred = predict(res.tree, credit_train, type = "class")

but I get this message:

Error in predict.rpart(res.tree, credit_test, type = "class") : Invalid prediction for "rpart" object

In addition:

Warning message:

 'newdata' had 271 rows but variables found have 729 rows

I can't figure out if I don't have a library loaded or what is causing the it not to recognize the type, which is what so many resources say I need to use and why I'm getting a mismatch in the rows.

The 'newdata' at 271 rows is what my testing data set has and my training data-set has 729 rows.

Is the decision tree creation causing my problem or could it be the prediction code?

Responding to comments: I'm using the following libraries:

library(readxl)
library(dplyr)
library(factoextra)
library(corrplot)
library(rpart)
library(rpart.plot)
library(RColorBrewer)
library(pROC)
library(Hmisc)
library(fBasics)
library(rattle)
library(caret)

A sample of my data:

structure(list(CHK_ACCT = c(0, 1, 0, 0), DURATION = c(6, 48, 
42, 24), HISTORY = c(4, 2, 2, 3), NEW_CAR = c(0, 0, 0, 1), USED_CAR = c(0, 
0, 0, 0), FURNITURE = c(0, 0, 1, 0), `RADIO/TV` = c(1, 1, 0, 
0), EDUCATION = c(0, 0, 0, 0), RETRAINING = c(0, 0, 0, 0), AMOUNT = c(1169, 
5951, 7882, 4870), SAV_ACCT = c(4, 0, 0, 0), EMPLOYMENT = c(4, 
2, 3, 2), INSTALL_RATE = c(4, 2, 2, 3), MALE_DIV = c(0, 0, 0, 
0), MALE_SINGLE = c(1, 0, 1, 1), MALE_MAR_or_WID = c(0, 0, 0, 
0), `CO-APPLICANT` = c(0, 0, 0, 0), GUARANTOR = c(0, 0, 1, 0), 
PRESENT_RESIDENT = c(4, 2, 4, 4), REAL_ESTATE = c(1, 1, 0, 
0), PROP_UNKN_NONE = c(0, 0, 0, 1), AGE = c(67, 22, 45, 53
), OTHER_INSTALL = c(0, 0, 0, 0), RENT = c(0, 0, 0, 0), OWN_RES = c(1, 
1, 0, 0), NUM_CREDITS = c(2, 1, 1, 2), JOB = c(2, 2, 2, 2
), NUM_DEPENDENTS = c(1, 1, 2, 2), TELEPHONE = c(1, 0, 0, 
0), FOREIGN = c(0, 0, 0, 0), DEFAULT = c(0, 1, 0, 1), CHK_ACCT_rec = c(1, 
2, 1, 1), SAV_ACCT_rec = c(0, 1, 1, 1)), .Names = c("CHK_ACCT", 
"DURATION", "HISTORY", "NEW_CAR", "USED_CAR", "FURNITURE", "RADIO/TV", 
"EDUCATION", "RETRAINING", "AMOUNT", "SAV_ACCT", "EMPLOYMENT", 
"INSTALL_RATE", "MALE_DIV", "MALE_SINGLE", "MALE_MAR_or_WID", 
"CO-APPLICANT", "GUARANTOR", "PRESENT_RESIDENT", "REAL_ESTATE", 
"PROP_UNKN_NONE", "AGE", "OTHER_INSTALL", "RENT", "OWN_RES", 
"NUM_CREDITS", "JOB", "NUM_DEPENDENTS", "TELEPHONE", "FOREIGN", 
"DEFAULT", "CHK_ACCT_rec", "SAV_ACCT_rec"), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))


myformula = credit_train$DEFAULT ~ credit_train$CHK_ACCT_rec + 
credit_train$DURATION + credit_train$HISTORY + credit_train$NEW_CAR + 
credit_train$USED_CAR + credit_train$FURNITURE + credit_train$`RADIO/TV` + 
credit_train$EDUCATION + credit_train$RETRAINING + credit_train$AMOUNT + 
credit_train$SAV_ACCT_rec + credit_train$EMPLOYMENT + 
credit_train$INSTALL_RATE + credit_train$MALE_DIV + credit_train$MALE_SINGLE 
+ credit_train$MALE_MAR_or_WID + credit_train$`CO-APPLICANT` + 
credit_train$GUARANTOR + credit_train$PRESENT_RESIDENT + 
credit_train$REAL_ESTATE + credit_train$PROP_UNKN_NONE + credit_train$AGE +  
credit_train$OTHER_INSTALL + credit_train$RENT + credit_train$OWN_RES + 
credit_train$NUM_CREDITS + credit_train$JOB + credit_train$NUM_DEPENDENTS + 
credit_train$TELEPHONE + credit_train$FOREIGN

@calimo I hope this is what you needed.

It's hard to say without seeing your data/model. Is this a classification or a regression tree? It might be because you are using type = "class" for a continuous response variable. — Edgar Santos, Dec 21 '17 at 00:51
It works for me... you'll need to add a reproducing example if you want some help, see here for some guidelines: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Calimo, Dec 21 '17 at 08:58

Decision tree and error matrix calculations

0 Answers0