Logistic Regression Using R

Question

I am running logistic regressions using R right now, but I cannot seem to get many useful model fit statistics. I am looking for metrics similar to SAS:

http://www.ats.ucla.edu/stat/sas/output/sas_logit_output.htm

Does anyone know how (or what packages) I can use to extract these stats?

Thanks

Please note that there's also http://stats.stackexchange.com/ — NPE, Sep 01 '11 at 16:55
Not exactly a duplicate, I guess, but definitely related and relevant: http://stackoverflow.com/questions/3439248/logistic-regression-in-r-sas-like-output — joran, Sep 01 '11 at 17:17
All of those stats in the link are available one way or another... What exactly are you looking for? Also, R has various ways of doing regression and they depend on how your data are formatted so expanding your question with what you're looking for, a sample of your data, and maybe even how you're doing your analysis now, will provide you with much better answers to your question. — John, Sep 01 '11 at 17:42
Can you please be more specific about goodness-of-fit statistics in your question? The only ones I noticed in skimming that list were AIC, BIC (SC), and log-likelihood, which are very easy to calculate/display in R. Can you confirm? Or are there others (such as Nagelkerke's R^2, etc.)? — Ben Bolker, Sep 01 '11 at 20:56
specifically, i am looking for AIC, Log-likelihood ratio (LR), degrees of freedom (DF), LR DF, LR Pr > Chi Sq for intercepts and regression coefficents — josephmisiti, Sep 06 '11 at 16:36
OK. It's mildly irritating that you didn't say this in the first place, since that's *not* what I inferred when you said you wanted "useful model fit" statistics ... — Ben Bolker, Sep 06 '11 at 18:09

IRTFM · Answer 1 · 2011-09-02T13:55:00.273

Certainly glm with a family="binomial" argument is the function most commonly used for logistic regression. The default handling of contrasts of factors is different. R uses treatment contrasts and SAS (I think) uses sum contrasts. You can look these technical issues up on R-help. They have been discussed many, many times over the last ten+ years.

I see Greg Snow mentioned lrm in 'rms'. It has the advantage of being supported by several other functions in the 'rms' suite of methods. I would use it , too, but learning the rms package may take some additional time. I didn't see an option that would create SAS-like output.

If you want to compare the packages on similar problems that UCLA StatComputing pages have another resource: http://www.ats.ucla.edu/stat/r/dae/default.htm , where a large number of methods are exemplified in SPSS, SAS, Stata and R.

Actually, SAS also uses treatment contrasts; the difference is that it takes the LAST level as the reference level, instead of the first. You can get SAS-like behaviour in R with `options(contrasts=c("contr.SAS", "contr.poly"))`. — Hong Ooi, Sep 02 '11 at 05:47
@Hong Doi. Thanks for hte correction. Is it SPSS that uses sum contrasts? — IRTFM, Sep 02 '11 at 13:49

Ben Bolker · Accepted Answer · 2011-09-06T18:09:27.467

Here's a Poisson regression example:

## from ?glm:
d.AD <- data.frame(counts=c(18,17,15,20,10,20,25,13,12),
      outcome=gl(3,1,9),
      treatment=gl(3,3))
glm.D93 <- glm(counts ~ outcome + treatment,data = d.AD, family=poisson())

Now define a function to fit an intercept-only model with the same response, family, etc., compute summary statistics, and combine them into a table (matrix). The formula .~1 in the update command below means "refit the model with the same response variable [denoted by the dot on the LHS of the tilde] but with only an intercept term [denoted by the 1 on the RHS of the tilde]"

glmsumfun <- function(model) {
   glm0 <- update(model,.~1)  ## refit with intercept only
   ## apply built-in logLik (log-likelihood), AIC,
   ##  BIC (Bayesian/Schwarz Information Criterion) functions
   ## to models with and without intercept ('model' and 'glm0');
   ## combine the results in a two-column matrix with appropriate
   ## row and column names
   matrix(c(logLik(glm.D93),BIC(glm.D93),AIC(glm.D93),
           logLik(glm0),BIC(glm0),AIC(glm0)),ncol=2,
     dimnames=list(c("logLik","SC","AIC"),c("full","intercept_only")))
}

Now apply the function:

glmsumfun(glm.D93)

The results:

            full intercept_only
logLik -23.38066      -26.10681
SC      57.74744       54.41085
AIC     56.76132       54.21362

EDIT:

anova(glm.D93,test="Chisq") gives a sequential analysis of deviance table containing df, deviance (=-2 log likelihood), residual df, residual deviance, and the likelihood ratio test (chi-squared test) p-value.
drop1(glm.D93) gives a table with the AIC values (df, deviances, etc.) for each single-term deletion; drop1(glm.D93,test="Chisq") additionally gives the LRT test p value.

could you explain what is going on here, i am kinda new to R — josephmisiti, Sep 06 '11 at 16:36

score 2 · Answer 3 · answered Sep 01 '11 at 17:18

2

Using the lrm function in the rms package may give you the output that you are looking for.

answered Sep 01 '11 at 17:18

Greg Snow

48,497
6
83
110

Logistic Regression Using R

3 Answers3