R logistic regression area under curve

Question

I am performing logistic regression using this page. My code is as below.

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob

After running this code mydata dataframe has two columns - 'admit' and 'prob'. Shouldn't those two columns sufficient to get the ROC curve?

How can I get the ROC curve.

Secondly, by loooking at mydata, it seems that model is predicting probablity of admit=1.

Is that correct?

How to find out which particular event the model is predicting?

Thanks

UPDATE: It seems that below three commands are very useful. They provide the cut-off which will have maximum accuracy and then help to get the ROC curve.

coords(g, "best")

mydata$prediction=ifelse(prob>=0.3126844,1,0)

confusionMatrix(mydata$prediction,mydata$admit

Wouldn't it be very simple to test your uncertainty about what is being predicted with a small dataset? Or just look at the results of `with(mydata, table(admit,gre))`? Logistic regression is just estimating over a bunch of tables.) — IRTFM, Aug 26 '13 at 17:21
yes...we can do that way..and i followed the same method to arrive at the conclusion that the current case it is predicting admit=1..but thought that R will have some shortcut which will confirm my thinking. Any comment on finding out the threshold which will give maximum accuracy from roc object? — user2543622, Aug 26 '13 at 17:49
regarding "Any comment on finding out the threshold which will give maximum accuracy from roc object? ": i think that the answer is coords(g, "best")... — user2543622, Aug 26 '13 at 17:57

score 45 · Accepted Answer · answered Aug 26 '13 at 17:00

45

The ROC curve compares the rank of prediction and answer. Therefore, you could evaluate the ROC curve with package pROC as follow:

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob
library(pROC)
g <- roc(admit ~ prob, data = mydata)
plot(g)

answered Aug 26 '13 at 17:00

wush978

3,114
2
19
23

that makes sense. If possible please answer "Secondly, by loooking at mydata, it seems that model is predicting probablity of admit=1. is that correct? how to find out which particular event the model is predicting?" too. I looked at the roc object and understand that g$sensitivities and g$specificities will give me specific values, but if i want to find out the threshold which will give maximum accuracy then can i get that number from roc object? – user2543622 Aug 26 '13 at 17:10
@wush978 the "admit" variable is the predicted class or the actual class? – M.M Mar 20 '14 at 18:48
2

That URL to get the data now seems to be out of date. For anyone else interested in reproducing this example, what seems to work now is mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv") (using https:// prepended tho' that doesn't want to appear in the comment) – PM. Feb 12 '18 at 11:06
How can you calculate the area under the curve of two linear regressions estimated by a lmer model? Thanks https://stats.stackexchange.com/questions/570145/auc-between-estimated-lines?noredirect=1#comment1051970_570145 – Rosa Maria Apr 03 '22 at 03:21

score 12 · Answer 2 · edited Sep 10 '14 at 14:00

12

another way to plot ROC Curve...

library(Deducer)
modelfit <- glm(formula=admit ~ gre + gpa, family=binomial(), data=mydata, na.action=na.omit)
rocplot(modelfit)

edited Sep 10 '14 at 14:00

Jay

9,585
6
49
72

answered Sep 10 '14 at 13:07

Manoj Kumar

5,273
1
26
33

1

You'll need Java installed for this or you'll get an error, just FYI. `Error: .onLoad failed in loadNamespace() for 'rJava', details: call: fun(libname, pkgname) error: JAVA_HOME cannot be determined from the Registry` – alexpghayes Jun 30 '17 at 23:54

score 5 · Answer 3 · answered Nov 24 '15 at 17:25

5

#Another way to plot ROC

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")   
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")    
summary(mylogit)     
prob=predict(mylogit,type=c("response"))    
library("ROCR")    
pred <- prediction(prob, mydata$admit)    
perf <- performance(pred, measure = "tpr", x.measure = "fpr")     
plot(perf, col=rainbow(7), main="ROC curve Admissions", xlab="Specificity", 
     ylab="Sensitivity")    
abline(0, 1) #add a 45 degree line

answered Nov 24 '15 at 17:25

Conny

51
1
1

1

Can you add some explanation to your answer? – Joe Kennedy Nov 24 '15 at 17:42
2

@Conny AUC can be calculated as `auc = performance(pred, "auc")` – Learner Mar 17 '17 at 17:33
@SIslam Thank you for your comment! question's title is AUC and instead everybody is talking about ROC. They are related concepts, but not the same. – user1700890 Jun 14 '17 at 19:25

R logistic regression area under curve

3 Answers3