0

I fitted a logistic regression model in 10-fold cv. I can use the pROC package to get the AUC but it seems the AUC is not for the 10-fold CV because the cvAUC library gave a different AUC. I suspect the AUC from pROC is for one fold. Please how can extract the joint AUC for the 10-fold using the pROC library?

data(iris)
data <- iris[which(iris$Species=="setosa" | iris$Species=="versicolor"),]
data$ID <- seq.int(nrow(data))
table(data$Species)
data$Species <-as.factor(data$Species)
confusion_matrices <- list()
accuracy <- c()
for (i in c(1:10)) {
    set.seed(3456)
    folds <- caret::createFolds(data$Species, k = 10)
    test <- data[data$ID %in% folds[[i]], ]
    train <- data[data$ID %in% unlist(folds[-i]), ]
    model1 <- glm(as.factor(Species)~ ., family = binomial, data = train)
    summary(model1)
    pred <- predict(model1, newdata = test, type = "response")
    predR <- as.factor( pred >= 0.5)
    df <- data.frame(cbind(test$Species, predR))
    df_list <- lapply(df, as.factor)
    confusion_matrices[[i]] <- caret::confusionMatrix(df_list[[2]], df_list[[1]])
    accuracy[[i]] <- confusion_matrices[[i]]$overall["Accuracy"]
}
library(pander)
library(dplyr)
names(accuracy) <- c("Fold 1",....,"Fold 10")
accuracy %>%
  pander::pandoc.table()
mean(accuracy)
Calimo
  • 7,510
  • 4
  • 39
  • 61
Prisy
  • 1
  • 1
  • I'm not familiar with cvAUC, but why are you rounding the predictions? You normally don't want to do that with ROC analysis... – Calimo May 17 '20 at 13:01
  • Thanks, Calimo for your response. You are right, I rounded it here because my subsequent codes for confusion matrix need a rounded prediction (they are all in a for loop). But with or without the rounding, pROC still gives the same value but different ROC curve shapes. Please I want to know why pROC gives a different value from cvAUC for logistic regression. Thanks –  Prisy May 19 '20 at 09:36
  • Please add a reproducibe example so we can see what's going on. See https://stackoverflow.com/help/minimal-reproducible-example and https://stackoverflow.com/q/5963269/333599 for tips more specific to R. – Calimo May 19 '20 at 10:58
  • @Calimo. Thanks once again. I have used the iris data to show an example of how I computed mean accuracy for a 10-fold CV (as shown in the edited codes above). Please I would like to use your pROC package (or any other) to compute AUC for the 10-fold CV. –  Prisy May 19 '20 at 21:50
  • Right off the bat, pROC doesn't handle cross-validation. It just calculates the ROC curve for whatever data you give it. It's up to you to find out what you want do with your CV results and write code to implement it. Which could still be an interesting question, but now that you removed cvAUC from your example it makes it harder to answer. – Calimo May 27 '20 at 07:46
  • It's fine now. I am cvAUC package to estimate AUC for the 10cv. I thought I could do with pROC as well. Thanks –  Prisy May 28 '20 at 09:51

0 Answers0