high AUC scores on glm fits where the fit fails the Hosmer-Lemeshow test

Question

Is it possible to get high AUC scores but fail the Hosmer-Lemeshow test? This is a very simple logistic fit with just three explanatory variables. I did run the Hosmer-Lemeshow test manually and the result is good. Only the p-value reported by R is consistently low.

prd <- prediction(train$pred,train$Purchased)
roc.perf <- performance(prd, measure = "tpr", x.measure = "fpr")
plot(roc.perf,colorize = TRUE)
hist(pred, breaks = 50)

auc.perf = performance(prd, measure = "auc")
auc.perf@y.values[[1]]
# AUC 0.923


#SomersD test
somersD(test$Purchased,test$pred)
# 0.8854

#Hosmer-Lemeshow test
hoslem.test(test$Purchased,test$pred)
#p val = 0.0048

We probably can't answer this without a [mcve] (see also [this question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)). In addition to a data set, it would be useful to specify what packages you're using beyond base R (`hoslem.test` is not in base R ...) — Ben Bolker, Dec 19 '18 at 16:37
One possibility (better suited to [CrossValidated](https://stats.stackexchange.com)) is that if you have a large enough (real, not simulated) data set, goodness-of-fit tests like H-L will *always* fail, even if the model predicts reasonably well ... — Ben Bolker, Dec 19 '18 at 16:48
Thanks so much!!! Is there any way of testing? I tried to manually divide the train and test data set and checked actual scores vs. fitted scores... didn't find much deviation... can we assume that the model seems fine and ignore the HL test? — Sujatha Gopal, Dec 21 '18 at 12:41

high AUC scores on glm fits where the fit fails the Hosmer-Lemeshow test

0 Answers0