1

When comparing the ROC curves of machine learning models of normal and down sampled data, the resulting sensitivity and specificity is often very different because down sampled models even out the classes and place more emphasis on capturing the minor class. Why do the resulting ROC curves look so similar?

I think the question is best explained with a simple example based on this question here.

First, take the Sonar data and manually down sample the "R" class to imbalance the data and illustrate my question:

library(caret)
library(ggplot2)
library(mlbench)
library(plotROC)

data(Sonar)
set.seed(2019)
sonar_R <- Sonar %>% filter(Class == "R") %>% sample_n(., 20)
Sonar <- Sonar %>% filter(Class == "M") %>% rbind(sonar_R) 

Now use caret for random forest models with both ordinary and down sampling of the major class:

ctrl <- trainControl(method="repeatedcv", number = 5, repeats = 5,
                      summaryFunction=twoClassSummary, classProbs=T,
                      savePredictions = T)

ctrl_down <- trainControl(method="repeatedcv", number = 5, repeats = 5,
                 summaryFunction=twoClassSummary, classProbs=T,
                 savePredictions = T, sampling = "down")

rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), 
           trControl=ctrl)

rfFit_down <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), 
           trControl=ctrl_down)

I can now define a function to get the maximum ROC and corresponding sensitivity and specificity:

max_accuracy <- function(model) {
    model_accuracy <- as.data.frame(model$results)
    model_accuracy <- model_accuracy %>%
    select(ROC, Sens, Spec) %>%
    arrange(desc(ROC)) 
    model_accuracy <- model_accuracy[1,]
  return(model_accuracy)
}

max_accuracy(rfFit)
max_accuracy(rfFit_down)

Giving:

                  ROC       Sens  Spec
Normal         0.910        1     0.16
Down Sampled   0.872     0.827    0.77

And also draw the ROC curves:

selectedIndices <- rfFit$pred$mtry == 2
g <- ggplot(rfFit$pred[selectedIndices, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) + 
  geom_roc(n.cuts=0, increasing = FALSE) + 
  coord_equal() +
  style_roc(theme = theme_grey) +
  ggtitle("Normal") 
g + 
  annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4))) +
  scale_x_continuous("1 - Specificity") + scale_y_continuous("Sensitivity")


selectedIndices_down <- rfFit$pred$mtry == 2
g_down <- ggplot(rfFit_down$pred[selectedIndices_down, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) + 
  geom_roc(n.cuts=0, increasing = FALSE) + 
  coord_equal() +
  style_roc(theme = theme_grey) +
  ggtitle("Down Sampled") 
g_down + 
  annotate("text", x=0.75, y=0.25, label=paste("AUC =",     round((calc_auc(g_down))$AUC, 4))) +
  scale_x_continuous("1 - Specificity") + scale_y_continuous("Sensitivity")

Which look like this:

enter image description here enter image description here Why do the ROC curves look so similar? With a drastically different sensitivity and specificity value wouldn't the curves also look different from each other?

prmlmu
  • 643
  • 1
  • 8
  • 16
  • When you have imbalanced classes, and you are not using any kind of sampling to mitigate that then you most likely want to use another threshold to assign classes and not the default one (0.5). To pick one you should use a threshold aware metric and not AUC. While these ROC curves looks similar, the 0.5 threshold point on them is at a very different place. – missuse Jun 01 '19 at 11:52

0 Answers0