Run nested logit regression in R

Question

I want to run a nested logistic regression in R, but the examples I found online didn't help much. I read over an example from this website (Step by step procedure on how to run nested logistic regression in R) which is similar to my problem, but I found that it seems not resolved in the end (The questioner reported errors and I didn't see more answers).

So I have 9 predictors (continuous scores), and 1 categorical dependent variable (DV). The DV is called "effect", and it can be divided into 2 general categories: "negative (0)" and "positive (1)". I know how to run a simple binary logit regression (using the general grouping way, i.e., negative (0) and positive (1)), but this is not enough. "positive" can be further grouped into two types: "physical (1)" and "mental (2)". So I want to run a nested model which includes these 3 categories (negative (0), physical (1), and mental (2)), and reflects the nature that "physical" and "mental" are nested in "positive". Maybe R can compare these two models (general vs. detailed) together? So I created two new columns, one is called "effect general", in which the individual scores are "negative (0)" and "positive (1)"; the other is called "effect detailed", which contains 3 values - negative (0), physical (1), and mental (2). I ran a simple binary logit regression only using "effect general", but I don't know how to run a nested logit model for "effect detailed".

From the example I searched and other materials, the R package "mlogit" seems right, but I'm stuck with how to make it work for my data. I don't quite understand the examples in R-help, and this part in the example from this website I mentioned earlier (...shape='long', alt.var='town.list', nests=list(town.list)...) makes me very confused: I can see that my data shape should be 'wide', but I have no idea what "alt.var" and "nests" are...

I also looked at page 19 of the mlogit manual for examples of nested logit model calls. But I still cannot decide what I need in terms of options. (http://cran.r-project.org/web/packages/mlogit/mlogit.pdf)

Could someone provide me with detailed steps and notes on how to do it? I'm sure this example (if well discussed and resolved) is also going to help me and others a lot!

Thanks for your help!!!

I think you are confused in your understanding of nested models. They do not cover the DV with two levels within "positive' situation. You have a multinomial outcome. — IRTFM, Mar 27 '13 at 05:16
It seems you can have 3 outcomes. Next you need to decide if the outcomes are ordered. In other words, is the mental outcome more (or less) severe than the physical outcome. If the outcomes are not ordered, and just need to be different, then that will make things simpler. I may be able to dig up the code for either case tomorrow if nobody else has provided it by then. — Mark Miller, Mar 27 '13 at 07:07
Thanks Mark! The outcomes are not ordered, so the mental and physical outcomes are equally severe. They are just different categories. I actually did a multinomial logit regression, but as mentioned in some nested model literature, multinomial model may not work well when the types are not in the same level, so nested model should be a better way. And they mention other properties of a multi model, see below:) — Ferrari, Mar 27 '13 at 13:50
"MNL may not work well in either of the following cases due to its IIA property: 1) When alternatives are not independent. i.e., when there are groups of alternatives which are more similar than others, such as public transport modes versus the private vehicles. — Ferrari, Mar 27 '13 at 13:54
2)When there are taste variations among individuals. i.e., perceptions of individuals vary with their socioeconomic status. In such cases we require random coefficient models rather than mean value models as the MNL when there are groups of alternatives which are more similar than others, such as public transport modes versus the private vehicles." — Ferrari, Mar 27 '13 at 13:55
Hi DWin, I thought I was confused nested model with multinomial model too, but from the materials that I found, these two models can be both applied to my data and nested logit model would be a better one. Someone suggested me I may try "generalized linear mixed model", but the examples in R seem to be a different thing. Their "nestedness" exists in their IVs(predictors), not in outcomes. See the following R code I found in the "lme4" package: — Ferrari, Mar 27 '13 at 14:05
## generalized linear mixed model (gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), family = binomial, data = cbpp)) ## GLMM with individual-level variability (accounting for overdispersion) cbpp$obs <- 1:nrow(cbpp) (gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd) + (1|obs), family = binomial, data = cbpp)) anova(gm1,gm2) — Ferrari, Mar 27 '13 at 14:06

score 0 · Answer 1 · answered Oct 22 '14 at 11:30

I can help you with understanding the mlogit structure. When using the mlogit.data() command, specify choice = yourchoicevariable (and id.var = respondentid if you have a panel dataset, i.e. you have multiple responses from the same individual), along with the shape='wide' argument. The new data.frame created will be in long format, with a line for each choice situation, negative, physical, mental. So you will have 3 rows for which you only had one in the wide data format. Whatever your MN choice var is, it will now be a column of logical values, with TRUE for the row that the respondent chose. The row names will now have be in the format of observation#.level(choice variable) So in your case, if the first row of your dataset the person had a response of negative, you would see:

row.name   | choice
1.negative | TRUE
1.physical | FALSE
1.mental   | FALSE

Also not that the actual factor level for each choice is stored in an index called alt of the mlogit.data.frame which you can see by index(your.data.frame) and the observation number (i.e. the row number from your wide format data.frame) is stored in chid. Which is in essence what the row.name is telling you, i.e. chid.alt. Also note you DO NOT have to specify alt.var if your data is in wide format, only long format. The mlogit.data function does that for you as I have just described. Essentially, it takes unique(choice) when you specify your choice variable and creates the alt.var for you, so it is redundant if your data is in wide format.

You then specify the nests by adding to the mlogit() command a named list of the nests like this, assuming your factor levels are just '0','1','2':

mlogit(..., nests = c(negative = c('0'), positive = c('1','2')

or if the factor levels were 'negative','physical','mental' it would be the like this:

mlogit(..., nests = c(negative = c('negative'), positive = c('physical','mental')

Also note a nest of one still MUST be specified with a c() argument per the package documentation. The resulting model will then have the iv estimate between nests if you specify the un.nest.el=T argument, or nest specific estimates if un.nest.el=F You may find Kenneth Train's Examples useful

Run nested logit regression in R

1 Answers1