6

Please provide me with detailed (as possible) steps on how to do nested logistic regression in R. I'm new to R so it would help me a lot if i can get a detailed answer.

We tested how fisher's decision to exit the fishery is affected by different socioeconomic factors. Dependent variable: 0 - stay; 1 - exit Predictors: Age (continuous); Education (categorical); number of children (continuous), etc.

Our respondents were from different towns. A reviewer of our paper instructed us to account for the town where the respondent comes from by using nested logistic regression.

Your help is very much appreciated. Many thanks.

Richard Muallil
  • 69
  • 1
  • 1
  • 2
  • 2
    What have you tried thus far? Did you estimate the first set of models in R? Have you tried a nested structure and can't make R estimate it? Can you provide us some sample data? If you have the data in R, run `dput(myData)` and copy the contents back into the question. – Chase May 06 '11 at 03:15
  • Hi! so far, i did logit regression without nesting using systat software. Our reviewer suggested using R (which I'm not very familiar with) as it can do nested logistic regression. Thanks much – Richard Muallil May 06 '11 at 07:18

2 Answers2

11
install.packages("mlogit")    
library(mlogit)

my.data <- YOUR.DATA    

nested.logit <- mlogit(stay.exit~ age + education + children , my.data,
shape='long', alt.var='town.list', nests=list(town.list))

See page 19 of the mlogit manual for examples of nested logit model calls. You'll have to review the documentation yourself to ensure you're getting at what you need in terms of options. http://cran.r-project.org/web/packages/mlogit/mlogit.pdf

Segue: I usually like to have a peek at all models by town.list before I look at nested models:

Note: if your categorical variables are not factored you'll have to surround them with as.factor(variable) in your model formula

# Show a little love for plyr
library(plyr)

## RNG
set.seed(123454321)

## Create a list object to store your models
my.models <- list()

## import your data
my.data <- YOUR.DATA

## Create a loop that runs by the list of towns
for(x in 1:length(mydata$town.list) {
## subset data in each step by the town
dat <- subset(my.data, town == town.list[x])
## Save the model to it's own place in the list, identified by town
my.models[[town.list[x]]] <- glm(formula = stay.exit ~ age + education + children, 
family = binomial(link = "logit"), 
data=dat)
}

## View summaries for all models
llply(my.models, summary)

## Access specific models
my.models$<TOWN NAME>
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • 7
    Show plyr some more love and replace that for loop with a call to `dlply()`: `dlply(my.data, "town.list", function(x) glm(dep ~ ind1 + ind2, family = "binomial(link = "logit"), data = x)` – Chase May 06 '11 at 03:48
  • Ooo, I've never seen that one before! Thanks for the pointer! – Brandon Bertelsen May 06 '11 at 04:00
  • 2
    Yep yep. Also take a look at this [post](http://stats.stackexchange.com/questions/6856/aggregating-results-from-linear-model-runs-r) for some alternatives to summarizing lists of models. I've been using the `memisc` package personally, but a few options in that post. – Chase May 06 '11 at 04:02
  • Thank you very much Mr. Bertelsen. This is really big help for me. This is just what i've been looking for. I can now start reanalysing our data. – Richard Muallil May 06 '11 at 07:21
  • Dear Mr. Brandon Bertelsen and others, I tried to run my data based on the script (above) that you provided me but got this error result: Error in `row.names<-.data.frame`(`*tmp*`, value = c("1.1", "1.1", "1.1", : duplicate 'row.names' are not allowed – Richard Muallil May 10 '11 at 11:56
  • In addition: Warning message: non-unique values when setting 'row.names': ‘1.1’, ‘1.5’, ‘10.1’, ‘11.1’, ‘12.1’, ‘13.2’, ‘14.2’, ‘15.2’, ‘16.2’, ‘17.2’, ‘18.2’, ‘19.2’, ‘2.1’, ‘20.2’, ‘21.2’, ‘22.2’, ‘23.2’, ‘24.2’, ‘25.2’, ‘26.3’, ‘27.3’, ‘28.3’, ‘29.3’, ‘3.1’, ‘30.3’, ‘31.3’, ‘32.3’, ‘33.3’, ‘34.3’, ‘35.3’, ‘36.3’, ‘37.3’, ‘38.5’, ‘39.5’, ‘4.1’, ‘40.5’, ‘41.5’, ‘42.5’, ‘43.5’, ‘44.5’, ‘45.5’, ‘46.5’, ‘47.5’, ‘48.5’, ‘49.5’, ‘5.1’, ‘6.1’, ‘7.1’, ‘8.1’, ‘9.1’ – Richard Muallil May 10 '11 at 11:56
  • i suspect that there's problem with my inputs on alt.var and nests=list. Should i do some preliminary manipulation with my data in R before i run the nested logit script? Thank you very much for your patience. here's the actual script that i used: > nested.logit<-mlogit(P5k~as.factor(Ttype)+as.factor(Age)+as.factor(Educ)+as.factor(Child)+as.factor(Catch)+as.factor(Fyears), RPractice, shape='long', alt.var='Tcode', nests=list('Tcode')) – Richard Muallil May 10 '11 at 11:57
  • Sounds like you have a problem with your import. Perhaps it's pulling in a value field as row names rather than a column. Could you provide us with a bit of test data in whatever format you're working on it in? – Brandon Bertelsen May 15 '11 at 03:32
5

If I understand correctly, you want a varying-intercept model by town, i.e., a hierarchical model? If that's right, just use lme4 package.

Here is an example. Assuming you have in your data frame a variable (factor) called town, and that your data frame is called "fish", just run:

library(lme4)
library(arm) # to use the function display, much better than summary
nest.reg <- glmer(decision ~ age + education + children + (1|town), family = binomial, data = fish)
coef(nest.reg) # this will give the estimated coeficients by town (in this case, only the intercepts will vary).
fixef(nest.reg) # this will give the model averaging over all towns.
ranef(nest.reg) # the errors (specificity) at the town level. If you sum fixef with ranef you will get coef results

Finnaly, it's important to compare the estimated within and between town variation

display(nest.reg) # this will show you, among other things, the estimated residual variatio at the town and individual level. It's the error terms by town and by individual (Residual). The ratio between them will tell you how much information the is between town and within each town. 

Take a look at the last edition of Gelman and Hill's book for more information about multilevel regression using lme4.

Ps.: It's possible to include varying-slope by town as well. If that's what you need, just ask in the comments.

Manoel Galdino
  • 2,376
  • 6
  • 27
  • 40
  • Yes! Hierarchical modelling was suggested for our data analysis. We are using logit regression instead of linear regression though since our dependent variable is binary (e.g. exit or stay). Thank you very much. – Richard Muallil May 06 '11 at 07:25
  • If you have any questions about hierarchical modeling with binary dependent variable, just ask at the sister site: http://stats.stackexchange.com/. I'll be glad to answer over there. – Manoel Galdino May 06 '11 at 14:36
  • If you ask at the sister site, use the tag multilevel-analysis – Manoel Galdino May 06 '11 at 14:43