49

Problem: I need to estimate a set of multinomial logistic multilevel models and can’t find an appropriate R package. What is the best R package to estimate such models? STATA 13 recently added this feature to their multilevel mixed-effects models – so the technology to estimate such models seems to be available.

Details: A number of research questions require the estimation of multinomial logistic regression models in which the outcome variable is categorical. For example, biologists might be interested to investigate which type of trees (e.g., pine trees, maple trees, oak trees) are most impacted by acid rain. Market researchers might be interested whether there is a relationship between the age of customers and the frequency of shopping at Target, Safeway, or Walmart. These cases have in common that the outcome variable is categorical (unordered) and multinomial logistic regressions are the preferred method of estimation. In my case, I am investigating differences in types of human migration, with the outcome variable (mig) coded 0=not migrated, 1=internal migration, 2=international migration. Here is a simplified version of my data set:

migDat=data.frame(hhID=1:21,mig=rep(0:2,times=7),age=ceiling(runif(21,15,90)),stateID=rep(letters[1:3],each=7),pollution=rep(c("high","low","moderate"),each=7),stringsAsFactors=F)

   hhID mig age stateID pollution
1     1   0  47       a      high
2     2   1  53       a      high
3     3   2  17       a      high
4     4   0  73       a      high
5     5   1  24       a      high
6     6   2  80       a      high
7     7   0  18       a      high
8     8   1  33       b       low
9     9   2  90       b       low
10   10   0  49       b       low
11   11   1  42       b       low
12   12   2  44       b       low
13   13   0  82       b       low
14   14   1  70       b       low
15   15   2  71       c  moderate
16   16   0  18       c  moderate
17   17   1  18       c  moderate
18   18   2  39       c  moderate
19   19   0  35       c  moderate
20   20   1  74       c  moderate
21   21   2  86       c  moderate

My goal is to estimate the impact of age (independent variable) on the odds of (1) migrating internally vs. not migrating, (2) migrating internationally vs. not migrating, (3) migrating internally vs. migrating internationally. An additional complication is that my data operate at different aggregation levels (e.g., pollution operates at the state-level) and I am also interested in predicting the impact of air pollution (pollution) on the odds of embarking on a particular type of movement.

Clunky solutions: One could estimate a set of separate logistic regression models by reducing the data set for each model to only two migration types (e.g., Model 1: only cases coded mig=0 and mig=1; Model 2: only cases coded mig=0 and mig=2; Model 3: only cases coded mig=1 and mig=2). Such a simple multilevel logistic regression model could be estimated with lme4 but this approach is less ideal because it does not appropriately account for the impact of the omitted cases. A second solution would be to run multinomial logistic multilevel models in MLWiN through R using the R2MLwiN package. But since MLWiN is not open source and the generated object difficult to use, I would prefer to avoid this option. Based on a comprehensive internet search there seem to be some demand for such models but I am not aware of a good R package. So it would be great if some experts who have run such models could provide a recommendation and if there are more than one package maybe indicate some advantages/disadvantages. I am sure that such information would be a very helpful resource for multiple R users. Thanks!!

Best, Raphael

Raphael
  • 1,143
  • 1
  • 11
  • 16
  • 13
    two suggestions: (1) look into the `MCMCglmm` package; (2) your "clunky method" is actually the standard method (e.g. see Dobson and Barnett *Introduction to Generalized Linear Models*, 3d ed.); one parameterizes a multinomial model as series of binomial contrasts (level 1 vs level 2, level 1 vs level 3) and fit a series of models. This is actually a complete model because any two-category subset of a multinomial model is conditionally binomial (i.e. if you know it's A or B, then A is a binomial sample from (A+B)); any complete set of pairs is a valid parameterization. – Ben Bolker Jan 13 '14 at 01:18
  • 2
    In your case, since your categories are *somewhat* ordered I would probably parameterize them as (no migration vs. internal or international migration), (internal vs international migration); this also sets you up for a comparison with an ordinal model (see the `ordinal` package). – Ben Bolker Jan 13 '14 at 01:20
  • Thanks a lot, Ben Bolker! Both suggestions are indeed very helpful and I will explore them more. – Raphael Jan 14 '14 at 00:45

7 Answers7

38

There are generally two ways of fitting a multinomial models of a categorical variable with J groups: (1) Simultaneously estimating J-1 contrasts; (2) Estimating a separate logit model for each contrast.

Produce these two methods the same results? No, but the results are often similar

Which method is better? Simultaneously fitting is more precise (see below for an explanation why)

Why would someone use separate logit models then? (1) the lme4 package has no routine for simultaneously fitting multinomial models and there is no other multilevel R package that could do this. So separate logit models are presently the only practical solution if someone wants to estimate multilevel multinomial models in R. (2) As some powerful statisticians have argued (Begg and Gray, 1984; Allison, 1984, p. 46-47), separate logit models are much more flexible as they permit for the independent specification of the model equation for each contrast.

Is it legitimate to use separate logit models? Yes, with some disclaimers. This method is called the “Begg and Gray Approximation”. Begg and Gray (1984, p. 16) showed that this “individualized method is highly efficient”. However, there is some efficiency loss and the Begg and Gray Approximation produces larger standard errors (Agresti 2002, p. 274). As such, it is more difficult to obtain significant results with this method and the results can be considered conservative. This efficiency loss is smallest when the reference category is large (Begg and Gray, 1984; Agresti 2002). R packages that employ the Begg and Gray Approximation (not multilevel) include mlogitBMA (Sevcikova and Raftery, 2012).


Why is a series of individual logit models imprecise? In my initial example we have a variable (migration) that can have three values A (no migration), B (internal migration), C (international migration). With only one predictor variable x (age), multinomial models are parameterized as a series of binomial contrasts as follows (Long and Cheng, 2004 p. 277):

Eq. 1:  Ln(Pr(B|x)/Pr(A|x)) = b0,B|A + b1,B|A (x) 
Eq. 2:  Ln(Pr(C|x)/Pr(A|x)) = b0,C|A + b1,C|A (x)
Eq. 3:  Ln(Pr(B|x)/Pr(C|x)) = b0,B|C + b1,B|C (x)

For these contrasts the following equations must hold:

Eq. 4: Ln(Pr(B|x)/Pr(A|x)) + Ln(Pr(C|x)/Pr(A|x)) = Ln(Pr(B|x)/Pr(C|x))
Eq. 5: b0,B|A + b0,C|A = b0,B|C
Eq. 6: b1,B|A + b1,C|A = b1,B|C

The problem is that these equations (Eq. 4-6) will in praxis not hold exactly because the coefficients are estimated based on slightly different samples since only cases from the two contrasting groups are used und cases from the third group are omitted. Programs that simultaneously estimate the multinomial contrasts make sure that Eq. 4-6 hold (Long and Cheng, 2004 p. 277). I don’t know exactly how this “simultaneous” model solving works – maybe someone can provide an explanation? Software that do simultaneous fitting of multilevel multinomial models include MLwiN (Steele 2013, p. 4) and STATA (xlmlogit command, Pope, 2014).


References:

Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons.

Allison, P. D. (1984). Event history analysis. Thousand Oaks, CA: Sage Publications.

Begg, C. B., & Gray, R. (1984). Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika, 71(1), 11-18.

Long, S. J., & Cheng, S. (2004). Regression models for categorical outcomes. In M. Hardy & A. Bryman (Eds.), Handbook of data analysis (pp. 258-285). London: SAGE Publications, Ltd.

Pope, R. (2014). In the spotlight: Meet Stata's new xlmlogit command. Stata News, 29(2), 2-3.

Sevcikova, H., & Raftery, A. (2012). Estimation of multinomial logit model using the Begg & Gray approximation.

Steele, F. (2013). Module 10: Single-level and multilevel models for nominal responses concepts. Bristol, U.K,: Centre for Multilevel Modelling.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Raphael
  • 1,143
  • 1
  • 11
  • 16
  • 2
    This is correct and very useful. Two added benefits of running models separately is that (1) the computation-to-output time is shorter for each contrast (especially helpful for larger data sets and more complex models) and (2) running separate models encourages examining contrasts that might be ignored (rather than using a single reference category). – statsRus Jun 26 '14 at 13:43
22

An older question, but I think a viable option has recently emerged is brms, which uses the Bayesian Stan program to actually run the model For example, if you want to run a multinomial logistic regression on the iris data:

b1 <- brm (Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width,
           data=iris, family="categorical",
           prior=c(set_prior ("normal (0, 8)")))

And to get an ordinal regression -- not appropriate for iris, of course -- you'd switch the family="categorical" to family="acat" (or cratio or sratio, depending on the type of ordinal regression you want) and make sure that the dependent variable is ordered.

Clarification per Raphael's comment: This brm call compiles your formula and arguments into Stan code. Stan compiles it into C++ and uses your system's C++ compiler -- which is required. On a Mac, for example, you may need to install the free Developer Tools to get C++. Not sure about Windows. Linux should have C++ installed by default.)

Clarification per Qaswed's comment: brms easily handles multilevel models as well using the R formula (1 | groupvar) to add a group (random) intercept for a group, (1 + foo | groupvar) to add a random intercept and slope, etc.

Wayne
  • 933
  • 7
  • 11
  • Does "requires C++" mean that I have to know how to code in C++ or just install a certain program? Also, how would you specify the random effects/levels in the above brm call if observations are clustered - in the iris data set for example clustering by "Petal.Width"? – Raphael Jun 20 '16 at 19:31
  • A quick note on installing brm on linux: I had dependency issues stemming from needing the PKI package. Installing the development ssl package did this `sudo apt-get install libssl-dev` – Simon Oct 11 '16 at 11:01
  • 1
    Could you tell a bit more about setting priors to `normal(0,8)`? I understand this means a normal distribution with mean of 0 and sd of 8, I just wonder if this is a relatively safe choice with a simple model, which is expected to behave well. I tried to look into Gelman's guides on weakly informative priors but it is beyond me. – Szasulja Aug 13 '18 at 09:36
  • 1
    @Szasulja I'm not an expert, and can't say that `normal (0, 8)` is principled. Look at `curve (dnorm (x, 0, 8), from=-20, to=20)` and note that `qnorm (c(0.05, 0.95), 0, 8)` is `-13.15883 13.15883` which basically gives the slopes and intercepts plenty of room but eliminates highly-unlikely values. – Wayne Aug 13 '18 at 12:51
  • @Qaswed: I've edited the answer to take this into account. Thanks! (`brms` easily handles multilevel models.) – Wayne Nov 29 '18 at 12:50
  • Be careful of factor labels. I ran into trouble with some exotic labels "^" and "@". I renamed them to something more vanilla ("x" and "q" respectively). Got syntax error with the originals. – tmn Oct 30 '19 at 23:28
2

I'm puzzled that this technique is descried as "standard" and "equivalent", though it might well be a good practical solution. (Guess I'd better to check out the Allison and Dobson & Barnett references). For the simple multinomial case ( no clusters, repeated measures etc.) Begg and Gray (1984) propose using k-1 binomial logits against a reference category as an approximation (though a good one) in many cases to full blown multinomial logit. They demonstrate some loss of efficiency when using a single reference category, though it's small for cases where a single high-frequency baseline category is use as the reference. Agresti (2002: p. 274) provides an example where there is a small increase in standard errors even when the baseline category constitutes over 70% of 219 cases in a five category example.

Maybe it's no big deal, but I don't see how the approximation would get any better adding a second layer of randomness.

References
Agresti, A. (2002). Categorical data analysis. Hoboken NJ: Wiley.

Begg, C. B., & Gray, R. (1984). Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika, 71(1), 11–18.

tmn
  • 81
  • 1
  • 5
1

I will recommend you to use the package "mlogit"

Anom
  • 19
  • 2
  • 1
    maybe give a link (http://cran.r-project.org/web/packages/mlogit/) and some description of the algorithm used by the package? (Link-only answers are discouraged, although this is certainly a useful pointer.) – Ben Bolker May 17 '14 at 17:51
  • 3
    To my knowledge the mlogit package does not allow to include random effects or specify a multilevel structure. So I can't use this package for my research. – Raphael May 21 '14 at 21:55
  • 2
    The `mlogit` package clearly handles (at least two-level, e.g. repeated measures or panel data) mixed-effect multinomial models. Its algorithms are based on simulated maximum likelihood methods for random utility models as thoroughly documented in Train (2009). The methods are essentially the same as those called "fixed-sample simulated likelihood" by Demidenko (2005). **References** Demidenko, E. (2005). Mixed Models: Theory and Applications. Wiley-Interscience.Train, K. (2009). Discrete Choice Methods with Simulation (Second Edition). Cambridge: Cambridge University Press. – tmn Nov 13 '17 at 18:49
  • Try bayesm::rhierMnlRwMixture. It will handle simple longitudinal/ repeated-measures models with (mixture-)Gaussian random slopes and intercepts. There's a book with many examples: Rossi, P. E., & Allenby, G. M. (2003). Bayesian statistics and marketing. Marketing Science, 22(3), 304-328. – tmn Mar 28 '18 at 17:45
  • For a Bayesian approach: Try package `bayesm`. [link] (https://cran.r-project.org/web/packages/bayesm/index.html). The function `bayesm::rhierMnlRwMixture` will handle simple longitudinal/ repeated-measures models with (mixture-)Gaussian random slopes and intercepts. There's a book with many examples by the package authors: Rossi, P. E., & Allenby, G. M. (2003). Bayesian statistics and marketing. Marketing Science, 22(3), 304-328. – tmn Mar 28 '18 at 17:56
1

I am dealing with the same issue and one possible solution I found seems to resort to the poisson (loglinear/count) equivalent of the multinomial logistic model as described in this mailinglist, these nice slides or in Agresti (2013: 353-356). Thus, it should be possible to use the glmer(... family=poisson) function from the package lme4 with some aggregation of the data.

Reference:
Agresti, A. (2013) Categorical data analysis. Hoboken, NJ: Wiley.

  • Could you briefly clarify why you think that it is possible to use a poisson model for a multinomial outcome? (I skimmed through the mailing list and slides but don't fully understand this approach). Different values on a nominal outcome variable mean completely different things (e.g., 0= no migration, 1=domestic migration; 2=international migration). Poisson models treat the assigned values as counts so I think that the coefficient estimates of such a model would be fairly meaningless. – Raphael Jun 20 '16 at 19:23
  • 1
    Here are a couple of references on what is sometimes called the 'binomial-multinomial transform'. 1) Baker, S. G. (1994). The multinomial-Poisson transformation. The Statistician, 43(4), 495–504. 2) Guimaraes, P. (2004). Understanding the multinomial-Poisson transformation. Stata Journal, 4, 265–273. The method requires adding an indicator variable in the predictor for every unique combination of explanatory variables. It works well enough for small models, but gets very clumsy otherwise. – tmn Nov 13 '17 at 22:22
1

Since I had the same problem, I recently came across this question. I found out this package called ordinal having this cumulative link mixed model function (clmm2) that seems similar to the proposed brm function, but using a frequentist approach.

Basically, you would need to set the link function (for instance as logit), you can choose to have nominal variables (meaning, those variables that are not fulfilling the proportional odds assumption), set the threshold to "flexible" if you want to allow having unstructured cut-points, and finally add the argument "random" for specifying any variable that you would like to have with a random effect.

I found also the book Multilevel Modeling Using R, W. Holmes Finch Jocelyn E. Bolin, Ken Kelley and they illustrate how to use the function from page 151, with nice examples.

s.cerioli
  • 135
  • 1
  • 7
0

Here's an implementation (not my own). I'd just work off this code. Plus, this way you'll really know what's going on under the hood.

http://www.nhsilbert.net/docs/rcode/multilevel_multinomial_logistic_regression.R

  • 1
    Thanks for sharing! But I guess that my statistical knowledge is too limited to work with a raw code like this. The code would have to be a lot more annotated for me to know why they do what they do and to be sure there are no problems/errors present. It appears they are using a Monte Carlo Approach similar to the MCMCglmm package mentioned by Ben Bolker above, but I am not quite sure... – Raphael Jan 14 '14 at 00:56
  • Fair enough! Do you have to do your analysis in R? You could just do you analysis in Stata? Otherwise, I can't think of anything else. – Henry David Thorough Jan 14 '14 at 05:38
  • Yes, STATA would be my last resort. But since I do all the variable construction and data preprocessing in R, I would like to stick with one Software. – Raphael Jan 15 '14 at 16:55
  • 1
    I might be misunderstanding the problem, but why don't you just write out the processed dataframe with all of the necessary variables as a csv then import it into Stata? – Henry David Thorough Jan 15 '14 at 18:42