1

Hello (first timer here),

I would like to estimate a "two-way" cluster-robust variance-covariance matrix in R. I am using a particular canned routine from the "multiwayvcov" library. My question relates solely to the set-up of the cluster.vcov function in R. I have panel data of various crime outcomes. My cross-sectional unit is the "precinct" (over 40 precincts) and I observe crime in those precincts over several "months" (i.e., 24 months). I am evaluating an intervention that 'turns on' (dummy coded) for only a few months throughout the year.

I include "precinct" and "month" fixed effects (i.e., a full set of precinct and month dummies enter the model). I have only one independent variable I am assessing. I want to cluster on "both" dimensions but I am unsure how to set it up.

Do I estimate all the fixed effects with lm first? Or, do I simply run a model regressing crime on the independent variable (excluding fixed effects), then use cluster.vcov i.e., ~ precinct + month_year.

This seems like it would provide the wrong standard error though. Right? I hope this was clear. Sorry for any confusion. See my set up below.

library(multiwayvcov)

model <- lm(crime ~ as.factor(precinct) + as.factor(month_year) + policy, data = DATASET_full)

boot_both <- cluster.vcov(model, ~ precinct + month_year)

coeftest(model, boot_both)

### What the documentation offers as an example
### https://cran.r-project.org/web/packages/multiwayvcov/multiwayvcov.pdf

library(lmtest)

data(petersen)

m1 <- lm(y ~ x, data = petersen)

### Double cluster by firm and year using a formula

vcov_both_formula <- cluster.vcov(m1, ~ firmid + year)

coeftest(m1, vcov_both_formula)

Is is appropriate to first estimate a model that ignores the fixed effects?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Thomas Bilach
  • 591
  • 2
  • 16

1 Answers1

2

First the answer: you should first estimate your lm -model using fixed effects. This will give you your asymptotically correct parameter estimates. The std errors are incorrect because they are calculated from a vcov matrix which assumes iid errors.

To replace the iid covariance matrix with a cluster robust vcov matrix, you can use cluster.vcov, i.e. my_new_vcov_matrix <- cluster.vcov(~ precinct + month_year).

Then a recommendation: I warmly recommend the function felm from lfe for both multi-way fe's and cluster-robust standard erros.

The syntax is as follows:

library(multiwayvcov)
library(lfe)

data(petersen)

my_fe_model <- felm(y~x | firmid + year | 0 | firmid + year, data=petersen )

summary(my_fe_model)
Otto Kässi
  • 2,943
  • 1
  • 10
  • 27
  • Thank you for your prompt response! I like both functions. However, `cluster.vcov`works well with `glm` objects too (I have count data as well). If I wanted to share with you a subset of my dataset for clarification, what is the best way to post it here without copy-and-pasting hundreds of rows? – Thomas Bilach May 02 '19 at 17:10
  • You might reconsider using robust covariance estimators with glm. https://davegiles.blogspot.com/2013/05/robust-standard-errors-for-nonlinear.html. – Otto Kässi May 02 '19 at 19:21
  • Re: how to attach data to SO questions. Look at examples here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Otto Kässi May 02 '19 at 19:22
  • Thank you all! Employing a "double-clustering" approach (i.e., clustering on "units" and "time") sometimes produces negative variances in a variance-covariance matrix. This can be corrected when `fix = TRUE` inside the `vcovCL` function. Why does "two-way" clustering produce negative variances? Isn't this impossible? Although I've corrected the issue, it has been difficult to wrap my head around it, conceptually. Any thoughts? – Thomas Bilach May 03 '19 at 23:54
  • @Tom you will get a wider audience for your question if you create a new question on https://stats.stackexchange.com rather than ask it in the comments of a previous (unrelated) question. – Otto Kässi May 04 '19 at 16:10
  • Thank you, Otto. I will do so. – Thomas Bilach May 05 '19 at 20:21