3

I am trying to obtain ROC curve for the best model from caret on the test set. I came across MLeval package which seems to be handy (the output is very thorough, providing all the needed metrics with graphs using few lines of code). A nice example is here: https://stackoverflow.com/a/59134729/12875646

I am trying the code below and able to obtain the required metrics/graphs for the training set but keep getting error when I try to work on the testing set.

library(caret)
library(MLeval)
data(GermanCredit)

Train <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE)
training <- GermanCredit[ Train, ]
testing <- GermanCredit[ -Train, ]


ctrl <- trainControl(method = "repeatedcv", number = 10, classProbs = TRUE, savePredictions = TRUE)

mod_fit <- train(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own + 
    CreditHistory.Critical,  data=training, method="glm", family="binomial",
    trControl = ctrl, tuneLength = 5, metric = "ROC")

pred <- predict(mod_fit, newdata=testing)
confusionMatrix(data=pred, testing$Class)

test = evalm(mod_fit) # this gives the ROC curve for test set

test1 <- evalm(pred) # I am trying this to calculate the ROC curve for the test set (I understand this should be the final curve to report), but I keep getting this error: 

Error in evalm(pred) : Data frame or Caret train object required please.

on the package website, the first argument can be a dataframe with the probabilities and observed data. do you know how to prepare this dataframe using caret? https://www.rdocumentation.org/packages/MLeval/versions/0.1/topics/evalm

thank you

Update:

This should be the correct script, working well except displaying more than one ROC on one graph:

library(caret)
library(MLeval)
data(GermanCredit)

Train <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE)
training <- GermanCredit[ Train, ]
testing <- GermanCredit[ -Train, ]


ctrl <- trainControl(method = "repeatedcv", number = 10, classProbs = TRUE, savePredictions = TRUE)

mod_fit <- train(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own + 
    CreditHistory.Critical,  data=training, method="glm", family="binomial",
    trControl = ctrl, tuneLength = 5, metric = "ROC")

#pred <- predict(mod_fit, newdata=testing, type="prob")

confusionMatrix(data=pred, testing$Class)

test = evalm(mod_fit) # this gives the ROC curve for test set
m1 = data.frame(pred, testing$Class)
 
test1 <- evalm(m1)

#Train and eval a second model: 
mod_fit2 <- train(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own,  
data=training, method="glm", family="binomial",
    trControl = ctrl, tuneLength = 5, metric = "ROC")


pred2 <- predict(mod_fit2, newdata=testing, type="prob")
m2 = data.frame(pred2, testing$Class)

test2 <- evalm(m2)


# Display ROCs for both models in one graph: 

compare <- evalm(list(m1, m1), gnames=c('logistic1','logistic2')) 

I got the last step in the code from this source: https://www.r-bloggers.com/how-to-easily-make-a-roc-curve-in-r/

however it only displays one ROC curve (works well if I want to display the caret train outputs)

Bahi8482
  • 489
  • 5
  • 15
  • 2
    First, do you realize that you trained on the full data set (with cross validation), not just on the test set? `mod_fit <- train([...], data=GermanCredit, [...])` – Calimo Jul 10 '20 at 06:45
  • 1
    The function `predict` output an object of class `factor` so no what `evalm` expect. That said I didn't find a solution within the package but I didn't have a close look. You can find more resources [here](https://stackoverflow.com/questions/30366143/how-to-compute-roc-and-auc-under-roc-after-training-using-caret-in-r?noredirect=1&lq=1) and [here](https://cran.rstudio.org/doc/contrib/Sharma-CreditScoring.pdf). The second resources might not be fancy but does the job. – DJJ Jul 10 '20 at 07:30
  • @Calimo thanks for pointing that out. This was a typo - I fixed it. – Bahi8482 Jul 10 '20 at 15:04
  • 1
    In `train` function the training data should be used not the testing data as you have used. – UseR10085 Jul 10 '20 at 17:48
  • @BappaDas yes, thank you. sorry about that. – Bahi8482 Jul 10 '20 at 18:42
  • 1
    @DJJ thanks for sharing these 2 resources. they are helpful for a more deep understanding of the process and can be used to reproduce the results using the pROC package. – Bahi8482 Jul 10 '20 at 19:01
  • @BappaDas I am now trying to display more than one ROC curve on one graph. I used the same code that I used for caret training output (but replaced with the data frames). but it only shows one ROC curve. do you have any suggestions? the updated code is in added to the question. Thank you . – Bahi8482 Jul 14 '20 at 01:54

2 Answers2

5

You can use the following code

library(MLeval)
pred <- predict(mod_fit, newdata=testing, type="prob")
test1 <- evalm(data.frame(pred, testing$Class))

enter image description here

If you want to change the name of "Group1" into something else like GLM, you can use the following code

test1 <- evalm(data.frame(pred, testing$Class, Group = "GLM"))

enter image description here

UseR10085
  • 7,120
  • 3
  • 24
  • 54
1

Just wanted to add that you can generate a data frame with the results from several predictors, adding the ground truth column (obs) and an additional column (Group) telling evalm() what predictor they came from and it will plot them all one one graph. Source: evalm function help info.

# predict from several models
predicted_xgb <- predict(model_xgb, newdata =  testData3, type = "prob")
predicted_adaboost <- predict(model_adaboost, newdata =  testData3, type = "prob")
predicted_rf <- predict(model_rf, newdata =  testData3, type = "prob")

# append necessary columns
predicted_xgb$obs <- testData3$pred_group
predicted_xgb$Group <- "xgb"
predicted_adaboost$obs <- testData3$pred_group
predicted_adaboost$Group <- "adaboost"
predicted_rf$obs <- testData3$pred_group
predicted_rf$Group <- "rf"

#combine
combo_df <- rbind(predicted_xgb, predicted_adaboost, predicted_rf)

#evaluate
test2 <- evalm(combo_df)

ROC of all 3 models