32

Using the R package caret, how can I generate a ROC curve based on the cross-validation results of the train() function?

Say, I do the following:

data(Sonar)
ctrl <- trainControl(method="cv", 
  summaryFunction=twoClassSummary, 
  classProbs=T)
rfFit <- train(Class ~ ., data=Sonar, 
  method="rf", preProc=c("center", "scale"), 
  trControl=ctrl)

The training function goes over a range of mtry parameter and calculates the ROC AUC. I would like to see the associated ROC curve -- how do I do that?

Note: if the method used for sampling is LOOCV, then rfFit will contain a non-null data frame in the rfFit$pred slot, which seems to be exactly what I need. However, I need that for the "cv" method (k-fold validation) rather than LOO.

Also: no, roc function that used to be included in former versions of caret is not an answer -- this is a low level function, you can't use it if you don't have the prediction probabilities for each cross-validated sample.

smci
  • 32,567
  • 20
  • 113
  • 146
January
  • 16,320
  • 6
  • 52
  • 74
  • http://www.inside-r.org/packages/cran/caret/docs/roc – Frash Jun 30 '15 at 12:54
  • 1
    No, this is not the answer. First, modern version of caret does not have the function. Second, the function needs a "variable to cut along" -- specifically, the prediction probabilities, but how do I get these from the object returned by the train() function? – January Jun 30 '15 at 12:57

3 Answers3

48

There is just the savePredictions = TRUE argument missing from ctrl (this also works for other resampling methods):

library(caret)
library(mlbench)
data(Sonar)
ctrl <- trainControl(method="cv", 
                     summaryFunction=twoClassSummary, 
                     classProbs=T,
                     savePredictions = T)
rfFit <- train(Class ~ ., data=Sonar, 
               method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)
library(pROC)
# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2
# Plot:
plot.roc(rfFit$pred$obs[selectedIndices],
         rfFit$pred$M[selectedIndices])

ROC

Maybe I am missing something, but a small concern is that train always estimates slightly different AUC values than plot.roc and pROC::auc (absolute difference < 0.005), although twoClassSummary uses pROC::auc to estimate the AUC. Edit: I assume this occurs because the ROC from train is the average of the AUC using the separate CV-Sets and here we are calculating the AUC over all resamples simultaneously to obtain the overall AUC.

Update Since this is getting a bit of attention, here's a solution using plotROC::geom_roc() for ggplot2:

library(ggplot2)
library(plotROC)
ggplot(rfFit$pred[selectedIndices, ], 
       aes(m = R, d = factor(obs, levels = c("R", "M")))) + 
    geom_roc(hjust = -0.4, vjust = 1.5) + coord_equal()

ggplot_roc

thie1e
  • 3,588
  • 22
  • 22
  • 9
    Your comment about averaging many AUCs versus the one created from the OOB samples is correct. They will be somewhat different. – topepo Jul 15 '15 at 12:26
  • 2
    can extract finalModel mtry with `rfFit$finalModel$mtry` – Brian D Dec 08 '17 at 18:56
  • 1
    Which is the correct way to get the cross validated AUC - to create a single overall AUC or to average the AUCs across the separate cross validation sets? – samleighton87 Nov 05 '21 at 13:08
  • may i ask, what is M and obs? – Nadia Nadou Nov 01 '22 at 18:33
  • 1
    @NadiaNadou `rfFit$pred$M` and `rfFit$pred$M` are the predicted class probabilities for M and R. `obs` are the correct (observed) values in the cross validation sets. – thie1e Nov 19 '22 at 19:54
16

Here, I'm modifying the plot of @thei1e which others may find helpful.

Train model and make predictions

library(caret)
library(ggplot2)
library(mlbench)
library(plotROC)

data(Sonar)

ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T,
                     savePredictions = T)

rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)

# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2

Updated ROC curve plot

g <- ggplot(rfFit$pred[selectedIndices, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) + 
  geom_roc(n.cuts=0) + 
  coord_equal() +
  style_roc()

g + annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4)))

enter image description here

Megatron
  • 15,909
  • 12
  • 89
  • 97
4

Updated 2019. This is the easiest way https://cran.r-project.org/web/packages/MLeval/index.html. Gets the optimal parameters from the Caret object and the probabilities then calculates a number of metrics and plots including: ROC curves, PR curves, PRG curves, and calibration curves. You can put multiple objects from different models into it to compare the results.

library(MLeval)
library(caret)

data(Sonar)
ctrl <- trainControl(method="cv", 
  summaryFunction=twoClassSummary, 
  classProbs=T)
rfFit <- train(Class ~ ., data=Sonar, 
  method="rf", preProc=c("center", "scale"), 
  trControl=ctrl)

## run MLeval

res <- evalm(rfFit)

## get ROC

res$roc

## get calibration curve

res$cc

## get precision recall gain curve

res$prg

enter image description here

enter image description here

enter image description here

Christopher John
  • 4,220
  • 2
  • 13
  • 14
  • 2
    I tried your solution and get an error: `Error in evalm(rfFit) : No probabilities found in Caret output` – Bolle Jun 05 '20 at 17:05
  • 2
    @Bolle I got the same as you did. You need to set savePredictions = TRUE in the trainControl – SomeDutchGuy Jun 08 '20 at 13:12
  • 1
    And now how can I apply this optimal cutoff to the test dataset and get the confusion matrix with MLeval? – skan Mar 24 '21 at 18:58