-6

I have a dataset with 3240 observations in 16 different countries. I would like to run 16 separate logistic models for each country, as I expect the effect of my predictor variable to vary between the countries.

Data <- data.frame(
X = sample(1:100),
Y = sample(c("yes", "no"), 100, replace=TRUE),     
country=sample(c("UK","USA","Denmark","Norway","Iceland","Ireland","Sweden","Italy","France","Germany","Luxembourg","Belgium","Netherlands","Spain","Portugal","Greece"),100, replace=TRUE))

How to do this in R?

Cœur
  • 37,241
  • 25
  • 195
  • 267
champlos
  • 51
  • 1
  • 4
  • Posting a reproducible example with sample of your input data is a good way to get help. There are ways to solve your problem in R. – Gopala May 21 '16 at 21:57
  • I have tried to edit it now, so it maybe makes a bit more sense :) – champlos May 21 '16 at 22:09
  • 1
    There are a lot of questions that show how to run regression by a grouping variable ([this is one](http://stackoverflow.com/questions/1169539/linear-regression-and-group-by-in-r) [although i'd use a loop]. (ps my2c , might be worth testing if they are different, by fitting an interaction term, rather than assuming) – user20650 May 21 '16 at 22:12
  • It sounds like perhaps what you really need is a multi-level model. – SlowLoris May 21 '16 at 23:51
  • I already did a multi-level model. I am interested in running the 16 countries by themselves to compare the coefficients in the multi level model with the seperate regression coefficients. – champlos May 22 '16 at 08:15
  • See http://stackoverflow.com/questions/37395059/running-several-linear-regressions-from-a-single-dataframe-in-r/37401209#37401209 for an example of how to do this, except that you have to replace the `lm` call accordingly. – coffeinjunky May 26 '16 at 16:45

1 Answers1

2

Of course you can. Depending on your bedrock question, other approaches may be more appropriate (like a mixed effect model).

The following two examples are equivalent. You can add meat to it (extracting coefficients, for example).

sapply(Data$country, FUN = function(ctry) {
  summary(glm(Y ~ X, data = Data, family = binomial, subset = country == ctry))
}, simplify = FALSE) # if simplify = FALSE, it is coerced to a matrix

for (ctry in Data$country) {
  print( # print has to be called explicitly inside a for loop
    summary(glm(Y ~ X, data = Data, family = binomial, subset = country == ctry))
  )
}
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197