R: Regressions with group fixed effects and clustered standard errors with imputed dataset

Question

I am trying to run regressions in R (multiple models - poisson, binomial and continuous) that include fixed effects of groups (e.g. schools) to adjust for general group-level differences (essentially demeaning by group) and that cluster standard errors to account for the nesting of participants in the groups. I am also running these over imputed data frames (created with mice). It seems that different disciplines use the phrase ‘fixed effects’ differently so I am having a hard time searching to troubleshoot.

I have fit random intercept models (with lme4) but they do not account for the school fixed effects (and the random effects are not of interest to my research questions). Putting the groups in as dummies slows the run down tremendously. I could also run a single level glm/lm with group dummies but I have not been able to find a strategy to cluster the standard errors with the imputed data (tried the clusterSE package). I could hand calculate the demeaning but there seems like there should be a more direct way to achieve this.

 I have also looked at the lfe package but that does not seem to have glm options and the demeanlist function does not seem to be compatible with the imputed data frames.

In Stata, the command would be xtreg, fe vce (Cluster Variable), (fe = fixed effects, vce = clustered standard errors, with mi added to run over imputed dataframes). I could switch to Stata for the modeling but would definitely prefer to stay with R if possible!

Please let me know if this is better posted in cross-validated - I was on the fence but went with this one since it seemed to be more a coding question.

Thank you!

Look into the `plm` package to run FE panel model, then run in `lm.test` package's `coeftest()` to estimate with standard errors. See this [Cross Validated post](http://stats.stackexchange.com/questions/10017/standard-error-clustering-in-r-either-manually-or-in-plm). — Parfait, Mar 25 '17 at 13:59
Thanks for the post - I had looked at the plm package but since I did not have panel data I thought it wasn't the right fit. The comments on that post were helpful for using the plm regardless. — Elizabeth H, Mar 26 '17 at 13:48

score 0 · Accepted Answer · answered Mar 25 '17 at 15:39

0

I would block bootstrap. The "block" handles the clustering and "bootstrap" handles the generated regressors.

There is probably a more elegant way to make this extensible to other estimators, but this should get you started.

# junk data
x <- rnorm(100)
y <- 1 + 2*x + rnorm(100)
dat1 <- data.frame(y, x, id=seq_along(y))
summary(lm(y ~ x, data=dat1))

# same point estimates, but lower SEs
dat2 <- dat1[rep(seq_along(y), each=10), ]
summary(lm(y ~ x, data=dat2))

# block boostrap helper function
require(boot)
myStatistic <- function(ids, i) {
  myData <- do.call(rbind, lapply(i, function(i) dat2[dat2$id==ids[i], ]))
  myLm <- lm(y ~ x, data=myData)
  myLm$coefficients
}

# same point estimates from helper function if original data
myStatistic(unique(dat2$id), 1:100)


# block bootstrap recovers correct SEs
boot(unique(dat2$id), myStatistic, 500)

answered Mar 25 '17 at 15:39

Richard Herron

9,760
12
69
116

Thank you - your comment lead me to the multiwayvcov package which has a block bootstrap option. It didn't work with the imputed data though but it ultimately led me to the miceadds package's lm.cluster/glm.cluster (https://rdrr.io/cran/miceadds/man/lm.cluster.html) function, which seems to do the trick once I add + factor(cluster variable) (to add dummies for each group) to the regression formula. – Elizabeth H Mar 26 '17 at 13:51
@ElizabethH -- Nice. I didn't know of these packages (I mainly use Stata). I think you should address that you have a generated regressor. Stata does this for you with `mi` , but I am not sure these R packages know that you have imputed data. Bootstrapping is one solution. – Richard Herron Mar 26 '17 at 21:27
I used the 'with' command to run it over the imputed data (similar to the comments here - http://stackoverflow.com/questions/26473684/run-glm-mids-on-a-subset-of-imputed-data-from-mice-r). In case anyone references this question in the future, today I also found glmmml package has an option (glmmboot) but it only works with poisson and binomial outcomes. – Elizabeth H Mar 27 '17 at 21:35

R: Regressions with group fixed effects and clustered standard errors with imputed dataset

1 Answers1