14

A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z), but unfortunately I haven't figured out how to do the same analysis in R. Thanks in advance!

danilofreire
  • 503
  • 1
  • 5
  • 18
  • 1
    the `vcovHC()` function in the `sandwich` package might also be useful (not sure if it applies to logistic regression estimates) – Ben Bolker May 11 '13 at 21:34
  • 1
    if you're migrating from Stata you might find the package called ``plm`` useful. Also, there is the package called ``pcse`` for implementing panel corrected standard errors by manipulating the variance covariance matrix after estimation – hubert_farnsworth May 12 '13 at 06:36
  • Thank you very much for your replies, Ben and Hubert. I will also test the packages you have suggested and see if they work with logistic estimates. Thanks again! – danilofreire May 13 '13 at 22:25

4 Answers4

16

You might want to look at the rms (regression modelling strategies) package. So, lrm is logistic regression model, and if fit is the name of your output, you'd have something like this:

fit=lrm(disease ~ age + study + rcs(bmi,3), x=T, y=T, data=dataf)

fit

robcov(fit, cluster=dataf$id)

bootcov(fit,cluster=dataf$id)

You have to specify x=T, y=T in the model statement. rcs indicates restricted cubic splines with 3 knots.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
David F
  • 1,506
  • 1
  • 12
  • 14
  • Thank you very much! It has worked wonders! I will read rms's manual more closely and see if there is a way of clustering the coefficients by country and also by year. Once again, thank you! – danilofreire May 13 '13 at 22:27
  • 4
    This answer is already very good but it could be improved if it was fully replicable. I have not idea where the variables come from, what the output is, and why `rcs(bmi,3)` is necessary. – MERose Jan 15 '18 at 14:46
6

Another alternative would be to use the sandwich and lmtest package as follows. Suppose that z is a column with the cluster indicators in your dataset dat. Then

# load libraries
library("sandwich")
library("lmtest")

# fit the logistic regression
fit = glm(y ~ x, data = dat, family = binomial)

# get results with clustered standard errors (of type HC0)
coeftest(fit, vcov. = vcovCL(fit, cluster = dat$z, type = "HC0"))

will do the job.

baruuum
  • 171
  • 1
  • 3
  • 1
    Fun fact on that comment: The functions in ```miceadds``` actually refer to the sandwich package :) – Poza Mar 02 '22 at 13:24
5

I have been banging my head against this problem for the past two days; I magically found what appears to be a new package which seems destined for great things--for example, I am also running in my analysis some cluster-robust Tobit models, and this package has that functionality built in as well. Not to mention the syntax is much cleaner than in all the other solutions I've seen (we're talking near-Stata levels of clean).

So for your toy example, I'd run:

library(Zelig)
logit<-zelig(Y~X1+X2+X3,data=data,model="logit",robust=T,cluster="Z")

Et voilà!

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • Wow, that does appear to "just work" in ways that my R code never seems to. Is this new functionality? If not, why has Zelig not been the canonical way to solve this in R? – Philip May 05 '15 at 03:35
  • Don't know, but I hope it becomes so. [The project](http://zeligproject.org) certainly seems ambitious! The [Google group]( https://groups.google.com/forum/m/#!forum/zelig-statistical-software) doesn't seem so active though, so not sure how quick progress is. – MichaelChirico May 05 '15 at 04:11
  • 2
    Unfortunately, I think the command doesn't work in the latest version of `Zelig` (on CRAN). I've just run a few models with and without the `cluster` argument and the standard errors are exactly the same. I believe it's been like that since version 4.0, the last time I used the package. – danilofreire Jul 01 '15 at 05:07
  • 1
    yes, indeed they've dropped that functionality for now. check their google group (go to the community section of their website)--they're in the middle of restructuring the whole project; one of the developers said in reply to a post of mine that they're working on bringing back cluster/robust functionality – MichaelChirico Jul 01 '15 at 11:50
  • 5
    About three years later, cluster functionality is not back: `Error in glm.control(cluster = "group") : unused argument (cluster = "group")`. – MERose Mar 08 '18 at 10:26
  • Any update on whether this now includes the cluster functionality? – melbez Oct 21 '20 at 19:14
5

There is a command glm.cluster in the R package miceadds which seems to give the same results for logistic regression as Stata does with the option vce(cluster). See the documentation here.

In one of the examples on this page, the commands

mod2 <- miceadds::glm.cluster(data=dat, formula=highmath ~ hisei + female,
                              cluster="idschool", family="binomial")
summary(mod2)

give the same robust standard errors as the Stata command

logit highmath hisei female, vce(cluster idschool)

e.g. a standard error of 0.004038 for the variable hisei.