ROC curve for the testing set using Caret package

Question

I am trying to obtain ROC curve for the best model from caret on the test set. I came across MLeval package which seems to be handy (the output is very thorough, providing all the needed metrics with graphs using few lines of code). A nice example is here: https://stackoverflow.com/a/59134729/12875646

I am trying the code below and able to obtain the required metrics/graphs for the training set but keep getting error when I try to work on the testing set.

library(caret)
library(MLeval)
data(GermanCredit)

Train <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE)
training <- GermanCredit[ Train, ]
testing <- GermanCredit[ -Train, ]


ctrl <- trainControl(method = "repeatedcv", number = 10, classProbs = TRUE, savePredictions = TRUE)

mod_fit <- train(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own + 
    CreditHistory.Critical,  data=training, method="glm", family="binomial",
    trControl = ctrl, tuneLength = 5, metric = "ROC")

pred <- predict(mod_fit, newdata=testing)
confusionMatrix(data=pred, testing$Class)

test = evalm(mod_fit) # this gives the ROC curve for test set

test1 <- evalm(pred) # I am trying this to calculate the ROC curve for the test set (I understand this should be the final curve to report), but I keep getting this error:

Error in evalm(pred) : Data frame or Caret train object required please.

on the package website, the first argument can be a dataframe with the probabilities and observed data. do you know how to prepare this dataframe using caret? https://www.rdocumentation.org/packages/MLeval/versions/0.1/topics/evalm

thank you

Update:

This should be the correct script, working well except displaying more than one ROC on one graph:

library(caret)
library(MLeval)
data(GermanCredit)

Train <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE)
training <- GermanCredit[ Train, ]
testing <- GermanCredit[ -Train, ]


ctrl <- trainControl(method = "repeatedcv", number = 10, classProbs = TRUE, savePredictions = TRUE)

mod_fit <- train(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own + 
    CreditHistory.Critical,  data=training, method="glm", family="binomial",
    trControl = ctrl, tuneLength = 5, metric = "ROC")

#pred <- predict(mod_fit, newdata=testing, type="prob")

confusionMatrix(data=pred, testing$Class)

test = evalm(mod_fit) # this gives the ROC curve for test set
m1 = data.frame(pred, testing$Class)
 
test1 <- evalm(m1)

#Train and eval a second model: 
mod_fit2 <- train(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own,  
data=training, method="glm", family="binomial",
    trControl = ctrl, tuneLength = 5, metric = "ROC")


pred2 <- predict(mod_fit2, newdata=testing, type="prob")
m2 = data.frame(pred2, testing$Class)

test2 <- evalm(m2)


# Display ROCs for both models in one graph: 

compare <- evalm(list(m1, m1), gnames=c('logistic1','logistic2'))

I got the last step in the code from this source: https://www.r-bloggers.com/how-to-easily-make-a-roc-curve-in-r/

however it only displays one ROC curve (works well if I want to display the caret train outputs)

First, do you realize that you trained on the full data set (with cross validation), not just on the test set? `mod_fit <- train([...], data=GermanCredit, [...])` — Calimo, Jul 10 '20 at 06:45
The function `predict` output an object of class `factor` so no what `evalm` expect. That said I didn't find a solution within the package but I didn't have a close look. You can find more resources [here](https://stackoverflow.com/questions/30366143/how-to-compute-roc-and-auc-under-roc-after-training-using-caret-in-r?noredirect=1&lq=1) and [here](https://cran.rstudio.org/doc/contrib/Sharma-CreditScoring.pdf). The second resources might not be fancy but does the job. — DJJ, Jul 10 '20 at 07:30
@Calimo thanks for pointing that out. This was a typo - I fixed it. — Bahi8482, Jul 10 '20 at 15:04
In `train` function the training data should be used not the testing data as you have used. — UseR10085, Jul 10 '20 at 17:48
@DJJ thanks for sharing these 2 resources. they are helpful for a more deep understanding of the process and can be used to reproduce the results using the pROC package. — Bahi8482, Jul 10 '20 at 19:01
@BappaDas I am now trying to display more than one ROC curve on one graph. I used the same code that I used for caret training output (but replaced with the data frames). but it only shows one ROC curve. do you have any suggestions? the updated code is in added to the question. Thank you . — Bahi8482, Jul 14 '20 at 01:54

UseR10085 · Accepted Answer · 2022-06-13T11:07:28.580

5

You can use the following code

library(MLeval)
pred <- predict(mod_fit, newdata=testing, type="prob")
test1 <- evalm(data.frame(pred, testing$Class))

If you want to change the name of "Group1" into something else like GLM, you can use the following code

test1 <- evalm(data.frame(pred, testing$Class, Group = "GLM"))

edited Jun 13 '22 at 11:07

answered Jul 10 '20 at 06:49

UseR10085

7,120
3
24
54

this works. thanks for your help. have been trying to figure it out for the past 4 days! – Bahi8482 Jul 10 '20 at 18:58
How can someone change the name of "Group1" into something else? – user3571389 Jul 30 '21 at 01:42

score 1 · Answer 2 · answered Mar 20 '23 at 15:42

Just wanted to add that you can generate a data frame with the results from several predictors, adding the ground truth column (obs) and an additional column (Group) telling evalm() what predictor they came from and it will plot them all one one graph. Source: evalm function help info.

# predict from several models
predicted_xgb <- predict(model_xgb, newdata =  testData3, type = "prob")
predicted_adaboost <- predict(model_adaboost, newdata =  testData3, type = "prob")
predicted_rf <- predict(model_rf, newdata =  testData3, type = "prob")

# append necessary columns
predicted_xgb$obs <- testData3$pred_group
predicted_xgb$Group <- "xgb"
predicted_adaboost$obs <- testData3$pred_group
predicted_adaboost$Group <- "adaboost"
predicted_rf$obs <- testData3$pred_group
predicted_rf$Group <- "rf"

#combine
combo_df <- rbind(predicted_xgb, predicted_adaboost, predicted_rf)

#evaluate
test2 <- evalm(combo_df)

ROC of all 3 models

ROC curve for the testing set using Caret package

2 Answers2

Linked