3

I am running logistic regressions using R right now, but I cannot seem to get many useful model fit statistics. I am looking for metrics similar to SAS:

http://www.ats.ucla.edu/stat/sas/output/sas_logit_output.htm

Does anyone know how (or what packages) I can use to extract these stats?

Thanks

camille
  • 16,432
  • 18
  • 38
  • 60
josephmisiti
  • 9,862
  • 11
  • 56
  • 72
  • Please note that there's also http://stats.stackexchange.com/ – NPE Sep 01 '11 at 16:55
  • 2
    Not exactly a duplicate, I guess, but definitely related and relevant: http://stackoverflow.com/questions/3439248/logistic-regression-in-r-sas-like-output – joran Sep 01 '11 at 17:17
  • All of those stats in the link are available one way or another... What exactly are you looking for? Also, R has various ways of doing regression and they depend on how your data are formatted so expanding your question with what you're looking for, a sample of your data, and maybe even how you're doing your analysis now, will provide you with much better answers to your question. – John Sep 01 '11 at 17:42
  • Can you please be more specific about goodness-of-fit statistics in your question? The only ones I noticed in skimming that list were AIC, BIC (SC), and log-likelihood, which are very easy to calculate/display in R. Can you confirm? Or are there others (such as Nagelkerke's R^2, etc.)? – Ben Bolker Sep 01 '11 at 20:56
  • specifically, i am looking for AIC, Log-likelihood ratio (LR), degrees of freedom (DF), LR DF, LR Pr > Chi Sq for intercepts and regression coefficents – josephmisiti Sep 06 '11 at 16:36
  • OK. It's mildly irritating that you didn't say this in the first place, since that's *not* what I inferred when you said you wanted "useful model fit" statistics ... – Ben Bolker Sep 06 '11 at 18:09

3 Answers3

5

Certainly glm with a family="binomial" argument is the function most commonly used for logistic regression. The default handling of contrasts of factors is different. R uses treatment contrasts and SAS (I think) uses sum contrasts. You can look these technical issues up on R-help. They have been discussed many, many times over the last ten+ years.

I see Greg Snow mentioned lrm in 'rms'. It has the advantage of being supported by several other functions in the 'rms' suite of methods. I would use it , too, but learning the rms package may take some additional time. I didn't see an option that would create SAS-like output.

If you want to compare the packages on similar problems that UCLA StatComputing pages have another resource: http://www.ats.ucla.edu/stat/r/dae/default.htm , where a large number of methods are exemplified in SPSS, SAS, Stata and R.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 3
    Actually, SAS also uses treatment contrasts; the difference is that it takes the LAST level as the reference level, instead of the first. You can get SAS-like behaviour in R with `options(contrasts=c("contr.SAS", "contr.poly"))`. – Hong Ooi Sep 02 '11 at 05:47
  • @Hong Doi. Thanks for hte correction. Is it SPSS that uses sum contrasts? – IRTFM Sep 02 '11 at 13:49
5

Here's a Poisson regression example:

## from ?glm:
d.AD <- data.frame(counts=c(18,17,15,20,10,20,25,13,12),
      outcome=gl(3,1,9),
      treatment=gl(3,3))
glm.D93 <- glm(counts ~ outcome + treatment,data = d.AD, family=poisson())

Now define a function to fit an intercept-only model with the same response, family, etc., compute summary statistics, and combine them into a table (matrix). The formula .~1 in the update command below means "refit the model with the same response variable [denoted by the dot on the LHS of the tilde] but with only an intercept term [denoted by the 1 on the RHS of the tilde]"

glmsumfun <- function(model) {
   glm0 <- update(model,.~1)  ## refit with intercept only
   ## apply built-in logLik (log-likelihood), AIC,
   ##  BIC (Bayesian/Schwarz Information Criterion) functions
   ## to models with and without intercept ('model' and 'glm0');
   ## combine the results in a two-column matrix with appropriate
   ## row and column names
   matrix(c(logLik(glm.D93),BIC(glm.D93),AIC(glm.D93),
           logLik(glm0),BIC(glm0),AIC(glm0)),ncol=2,
     dimnames=list(c("logLik","SC","AIC"),c("full","intercept_only")))
}

Now apply the function:

glmsumfun(glm.D93)

The results:

            full intercept_only
logLik -23.38066      -26.10681
SC      57.74744       54.41085
AIC     56.76132       54.21362

EDIT:

  • anova(glm.D93,test="Chisq") gives a sequential analysis of deviance table containing df, deviance (=-2 log likelihood), residual df, residual deviance, and the likelihood ratio test (chi-squared test) p-value.
  • drop1(glm.D93) gives a table with the AIC values (df, deviances, etc.) for each single-term deletion; drop1(glm.D93,test="Chisq") additionally gives the LRT test p value.
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
2

Using the lrm function in the rms package may give you the output that you are looking for.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110