24

In analysis of categorical data, we often use logistic regression to estimate relationships between binomial outcomes and one or more covariates.

I understand this is a type of generalized linear model (GLM). In R, this is implemented with the glm function using the argument family=binomial. On the other hand, in categorical data analysis are multinomial models. Are these not GLMs? And can't they be estimated in R using the glm function?

(In this post for Multinomial Logistic Regression. The author uses an external package mlogit, which seems also outdated)

Why is the class of GLMs restricted to dichotomous outcomes? Is it because multi-class classification can be treated as multiple binary classification models?

hxd1011
  • 885
  • 2
  • 11
  • 23
  • Questions about how R works, why certain functions exist & others don't, etc., are off topic here. Note that, because this is not a programming question, it would be off topic on [SO] as well, & should not be migrated there. – gung - Reinstate Monica Feb 07 '17 at 19:21
  • (Actually, given the excellent answer by @AdamO below, which contains substantial statistical content, I'm considering retracting my close vote.) – gung - Reinstate Monica Feb 07 '17 at 19:26
  • @gung I proposed an edit to make the question a bit more relevant. – AdamO Feb 07 '17 at 23:44
  • That's good, @AdamO, but it's probably best for the OP to make edits that change the substance of the question. – gung - Reinstate Monica Feb 08 '17 at 00:20
  • @hxd1011 gung is right. Your question is *almost* statistical. Can you edit it to make it not about R, but about what constitutes GLMs? You can ask about the R function of the same name, but your ultimate confusion whether multicategory logit models were GLMs or not. – AdamO Feb 08 '17 at 13:10
  • @ashkan I will try to edit it and see if we can migrate it back – hxd1011 Feb 08 '17 at 15:03
  • @gung can you help to migrate this post back? – hxd1011 Feb 08 '17 at 18:05
  • I don't believe I can. I once had a Q migrated here (inappropriately, IMO), but couldn't get it sent back. For future reference, you should be aware that questions about how R works, why certain functions exist & others don't, etc., are typically off topic on [stats.SE], so you either want to ask them elsewhere, or be clear that they are statistical questions, not R questions. – gung - Reinstate Monica Feb 08 '17 at 18:20
  • @gung thanks for the information. Let's see if moderator of SE can move this question back – hxd1011 Feb 08 '17 at 18:23

1 Answers1

39

The GLMs in R are estimated with Fisher Scoring. Two approaches to multi-category logit come to mind: proportional odds models and log-linear models or multinomial regression.

The proportional odds model is a special type of cumulative link model and is implemented in the MASS package. It is not estimated with Fisher scoring, so the default glm.fit work-horse would not be able to estimate such a model. Interestingly, however, cumulative link models are GLMs and were discussed in the eponymous text by McCullogh and Nelder. A similar issue is found with negative binomial GLMs: they are GLMs in the strict sense of a link function, and a probability model, but require specialized estimation routines. As far as the R function glm, one should not look at it as an exhaustive estimator for every type of GLM.

nnet has an implementation of a loglinear model estimator. It is conformed to their more sophisticated neural net estimator using soft-max entropy, which is an equivalent formulation (theory is there to show this). It turns out you can estimate log-linear models with glm in default R if you're keen. The key lies in seeing the link between logistic and poisson regression. Recognizing the interaction terms of a count model (difference in log relative rates) as a first order term in a logistic model for an outcome (log odds ratio), you can estimate the same parameters and the same SEs by "conditioning" on the margins of the $K \times 2$ contingency table for a multi-category outcome. A related SE question on that background is here

Take as an example the following using the VA lung cancer data from the MASS package:

> summary(multinom(cell ~ factor(treat), data=VA))
# weights:  12 (6 variable)
initial  value 189.922327 
iter  10 value 182.240520
final  value 182.240516 
converged
Call:
multinom(formula = cell ~ factor(treat), data = VA)

Coefficients:
    (Intercept) factor(treat)2
2  6.931413e-01     -0.7985009
3 -5.108233e-01      0.4054654
4 -9.538147e-06     -0.5108138

Std. Errors:
  (Intercept) factor(treat)2
2   0.3162274      0.4533822
3   0.4216358      0.5322897
4   0.3651485      0.5163978

Residual Deviance: 364.481 
AIC: 376.481 

Compared to:

> VA.tab <- table(VA[, c('cell', 'treat')])
> summary(glm(Freq ~ cell * treat, data=VA.tab, family=poisson))

Call:
glm(formula = Freq ~ cell * treat, family = poisson, data = VA.tab)

Deviance Residuals: 
[1]  0  0  0  0  0  0  0  0

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.708e+00  2.582e-01  10.488   <2e-16 ***
cell2         6.931e-01  3.162e-01   2.192   0.0284 *  
cell3        -5.108e-01  4.216e-01  -1.212   0.2257    
cell4        -1.571e-15  3.651e-01   0.000   1.0000    
treat2        2.877e-01  3.416e-01   0.842   0.3996    
cell2:treat2 -7.985e-01  4.534e-01  -1.761   0.0782 .  
cell3:treat2  4.055e-01  5.323e-01   0.762   0.4462    
cell4:treat2 -5.108e-01  5.164e-01  -0.989   0.3226    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1.5371e+01  on 7  degrees of freedom
Residual deviance: 4.4409e-15  on 0  degrees of freedom
AIC: 53.066

Number of Fisher Scoring iterations: 3

Compare the interaction parameters and the main levels for treat in the one model to the second. Compare also the intercept. The AICs are different because the loglinear model is a probability model for even the margins of the table which are conditioned upon by other parameters in the model, but in terms of prediction and inference these two approaches yield identical results.

So in short, trick question! glm handles multi-category logistic regression, it just takes a greater understanding of what constitutes such models.

Community
  • 1
  • 1
AdamO
  • 4,283
  • 1
  • 27
  • 39
  • Great answer with code demo that enables me to see things differently never know the connection between possion and multi-class classification! – hxd1011 Feb 07 '17 at 17:59
  • Here, both `cell` and `treat` are categorical; noting that one can formulate `nnet`'s `multinom` as a GLM, does the analogy with Poisson regression naturally extend to the case where the factor variable is numerical? – fuglede May 21 '18 at 11:22
  • @fuglede no b/c only for the OR does OR(x,y) = OR(y,x). This is reflected in the interaction term: x:y= y:x. The association measure in a loglinear model is different when x is continuous valued, – AdamO May 21 '18 at 13:01
  • Hm, browsing around for the same thing, I came across http://data.princeton.edu/wws509/notes/c6.pdf whose Section 6.2.5 seems to suggest that you can do something if the interaction is kept in a particular form. – fuglede May 21 '18 at 20:21
  • If one fitted separate logistic / binomial models to model the probability of the outcome being 1/all other categories, 2/all other categories, etc and one would then rescale these coefficients so that the total probability of the outcome to belong to each of the categories would sum to 1, would that not be equivalent to a multinomial distribution though? – Tom Wenseleers Jan 07 '20 at 21:29
  • Or is the problem that in separate logistic models a different intercept term would be fitted, whereas in a proper multinomial model one would have a single intercept/baseline? I see that a similar question was asked here https://stats.stackexchange.com/questions/52104/multinomial-logistic-regression-vs-one-vs-rest-binary-logistic-regression – Tom Wenseleers Jan 07 '20 at 21:36