When comparing the ROC curves of machine learning models of normal and down sampled data, the resulting sensitivity and specificity is often very different because down sampled models even out the classes and place more emphasis on capturing the minor class. Why do the resulting ROC curves look so similar?
I think the question is best explained with a simple example based on this question here.
First, take the Sonar data and manually down sample the "R" class to imbalance the data and illustrate my question:
library(caret)
library(ggplot2)
library(mlbench)
library(plotROC)
data(Sonar)
set.seed(2019)
sonar_R <- Sonar %>% filter(Class == "R") %>% sample_n(., 20)
Sonar <- Sonar %>% filter(Class == "M") %>% rbind(sonar_R)
Now use caret for random forest models with both ordinary and down sampling of the major class:
ctrl <- trainControl(method="repeatedcv", number = 5, repeats = 5,
summaryFunction=twoClassSummary, classProbs=T,
savePredictions = T)
ctrl_down <- trainControl(method="repeatedcv", number = 5, repeats = 5,
summaryFunction=twoClassSummary, classProbs=T,
savePredictions = T, sampling = "down")
rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"),
trControl=ctrl)
rfFit_down <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"),
trControl=ctrl_down)
I can now define a function to get the maximum ROC and corresponding sensitivity and specificity:
max_accuracy <- function(model) {
model_accuracy <- as.data.frame(model$results)
model_accuracy <- model_accuracy %>%
select(ROC, Sens, Spec) %>%
arrange(desc(ROC))
model_accuracy <- model_accuracy[1,]
return(model_accuracy)
}
max_accuracy(rfFit)
max_accuracy(rfFit_down)
Giving:
ROC Sens Spec
Normal 0.910 1 0.16
Down Sampled 0.872 0.827 0.77
And also draw the ROC curves:
selectedIndices <- rfFit$pred$mtry == 2
g <- ggplot(rfFit$pred[selectedIndices, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) +
geom_roc(n.cuts=0, increasing = FALSE) +
coord_equal() +
style_roc(theme = theme_grey) +
ggtitle("Normal")
g +
annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4))) +
scale_x_continuous("1 - Specificity") + scale_y_continuous("Sensitivity")
selectedIndices_down <- rfFit$pred$mtry == 2
g_down <- ggplot(rfFit_down$pred[selectedIndices_down, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) +
geom_roc(n.cuts=0, increasing = FALSE) +
coord_equal() +
style_roc(theme = theme_grey) +
ggtitle("Down Sampled")
g_down +
annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g_down))$AUC, 4))) +
scale_x_continuous("1 - Specificity") + scale_y_continuous("Sensitivity")
Which look like this:
Why do the ROC curves look so similar? With a drastically different sensitivity and specificity value wouldn't the curves also look different from each other?