-1

I am working to plot a ROC curve of a model that uses a test/train set created with the caret R package. I either am not putting in the right data to plot or am missing something about the creation of my test/train set. Any insight??

*Edited with correct answer

library(caret)
library(mlbench)
set.seed(506)
data(whas)
inTrain <- createDataPartition(y = whas$bin.frail,
p = .75, list = FALSE)
str(inTrain)
training <- whas[ inTrain,]
testing <- whas[-inTrain,]
nrow(training)
nrow(testing)
tc <- trainControl("cv", 10, savePredictions=T)  #"cv" = cross-validation, 10-fold
mod1 <- train(bin.frail ~ ,
                      data      = training    ,
                      method    = "glm"    ,
                      family    = binomial ,
                      trControl = tc)

library(pROC)
mod1pred<- predict(mod1, newdata=testingresponse="prob")
plot(roc(testing$bin.frail, mod1pred[[2]]), print.auc=TRUE, col="red", 
xlim=c(0,1))
jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
A. Kather
  • 47
  • 2
  • 3
  • 8
  • 3
    Including a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question will increase your chances of getting an answer. – Samuel Oct 20 '17 at 00:17
  • Isn't `caret` returning probabilities for both classes? If so, make sure you are passing to `roc` only the "positive" class probabilities. – m-dz Oct 20 '17 at 00:55
  • What package contains the `whas` dataset? As it stands we can't reproduce your issue. – josliber Oct 20 '17 at 01:01
  • @m-dz: this was my original thought, but the poster isn't using `caret`s version of `predict` (e.g., note `response` argument in place of `type`) so the object returned should be a vector. – jruf003 Oct 20 '17 at 01:04
  • Figured it out! Caret does produce both probabilities so we need to specify one. This is how I ended up plotting: plot(roc(testing$bin.frail, mod1pred[[2]]),print.auc=TRUE, col="red", xlim=c(0,1)) – A. Kather Oct 20 '17 at 01:09

1 Answers1

0

It's hard to know for sure without a reproducible answer, but presumably your response variable bin.frail isn't numeric. For example, it might be coded using letters (e.g., "Y", "N"); or with numbers which are being stored as a factor. You could check this using is.numeric(whas$bin.frail).

As a side note, in your call to roc() it looks like mod1pred is being created from your training data whereas testing$bin.frail is from your test data. You could correct this by adding newdata = testing to your call to predict when creating mod1pred.

jruf003
  • 980
  • 5
  • 19