ROC curve from train/test set in caret R package

Question

I am working to plot a ROC curve of a model that uses a test/train set created with the caret R package. I either am not putting in the right data to plot or am missing something about the creation of my test/train set. Any insight??

*Edited with correct answer

library(caret)
library(mlbench)
set.seed(506)
data(whas)
inTrain <- createDataPartition(y = whas$bin.frail,
p = .75, list = FALSE)
str(inTrain)
training <- whas[ inTrain,]
testing <- whas[-inTrain,]
nrow(training)
nrow(testing)
tc <- trainControl("cv", 10, savePredictions=T)  #"cv" = cross-validation, 10-fold
mod1 <- train(bin.frail ~ ,
                      data      = training    ,
                      method    = "glm"    ,
                      family    = binomial ,
                      trControl = tc)

library(pROC)
mod1pred<- predict(mod1, newdata=testingresponse="prob")
plot(roc(testing$bin.frail, mod1pred[[2]]), print.auc=TRUE, col="red", 
xlim=c(0,1))

Including a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question will increase your chances of getting an answer. — Samuel, Oct 20 '17 at 00:17
Isn't `caret` returning probabilities for both classes? If so, make sure you are passing to `roc` only the "positive" class probabilities. — m-dz, Oct 20 '17 at 00:55
What package contains the `whas` dataset? As it stands we can't reproduce your issue. — josliber, Oct 20 '17 at 01:01
@m-dz: this was my original thought, but the poster isn't using `caret`s version of `predict` (e.g., note `response` argument in place of `type`) so the object returned should be a vector. — jruf003, Oct 20 '17 at 01:04
Figured it out! Caret does produce both probabilities so we need to specify one. This is how I ended up plotting: plot(roc(testing$bin.frail, mod1pred[[2]]),print.auc=TRUE, col="red", xlim=c(0,1)) — A. Kather, Oct 20 '17 at 01:09

jruf003 · Accepted Answer · 2017-10-20T01:09:35.287

It's hard to know for sure without a reproducible answer, but presumably your response variable bin.frail isn't numeric. For example, it might be coded using letters (e.g., "Y", "N"); or with numbers which are being stored as a factor. You could check this using is.numeric(whas$bin.frail).

As a side note, in your call to roc() it looks like mod1pred is being created from your training data whereas testing$bin.frail is from your test data. You could correct this by adding newdata = testing to your call to predict when creating mod1pred.

ROC curve from train/test set in caret R package

1 Answers1