3

I have an issue with creating a ROC Curve for my decision tree created by the rpart package. My goal was to predict "y" the success of the bank's marketing campaign. In the end, you can get a "yes" or a "no" as a possible answer. How can I approach my next step the ROC curve plot?

Here is the R code I have so far:

library(caTools) 
library(rpart) 
library(rpart.plot) 

set.seed(1234) 
sample = sample.split(bank$y, SplitRatio = .75) 
train = subset(bank, sample==TRUE) 
test = subset(bank, sample==FALSE)

tree <-rpart(y ~.,method="class",data=train) 
tree.preds<-predict(tree, test)
tree.preds<-as.data.frame(tree.preds) 
joiner <- function(x) {if (x >= 0.5) {return("Yes") } else {return("No")}}
tree.preds$y <- sapply(tree.preds$yes, joiner) 
table(tree.preds$y, test$y) 
prp(tree) 
Calimo
  • 7,510
  • 4
  • 39
  • 61
Meax
  • 31
  • 1
  • 2
  • I edited the question a bit to make it more suitable for the site. You should always ask a clear question rather than leave it open as you did. – Calimo Jun 16 '20 at 06:29

1 Answers1

2

First for ROC analysis you will want to get numeric predictions, such as probabilities:

predict(tree, test, type="prob")

If your variable had yes and no as answers, you will get two columns, labeled accordingly. I will assume that "yes" is the second one, and save that as predictions:

tree.preds <- predict(tree, test, type="prob")[, 2]

Then you can plug it this directly into a ROC function, such as the one provided by pROC:

library(pROC)
tree.roc <- roc(test$y, tree.preds)
print(tree.roc)
plot(tree.roc)
Luis Miguel
  • 5,057
  • 8
  • 42
  • 75
Calimo
  • 7,510
  • 4
  • 39
  • 61