0

I want to replicate a Stata do.file (panel model) in R, but unfortunately I'm ending up with the wrong standard error estimates. The data is proprietary, so I can't post it here. The Stata code used looks like:

xtreg  Y X, vce(cluster countrycodeid) fe nonest dfadj 

With fe for fixed effects, nonest indicating that the panels are not nested within the clusters, and dfadj for the fact that some sort of DF-adjustment takes place - not possible to find out which sort as of now.

My R-Code looks like this and makes me end up with the right coefficient values:

model <- plm(Y~X+as.factor(year),data=panel,model="within",index=c("codeid","year"))

Now comes the difficult part, which I haven't found a solution for so far, even after trying out numerous sorts of standard error robust estimation methods, for example making extensive use of lmtest and various degrees of freedom transformation methods. The standard errors are supposed to follow a country-year pair pattern (captured by the variable countrycodeid in the Stata code, which takes the form codeid-year, as there appears to be missing data for some variables which are not available on a monthly basis.

Does anyone know if there are special tricks to keep in mind when working with unbalanced panels and the plm() package, which sort of DF-adjustment can be used, and if there is a possibility to group data in the coeftest() function on a country-year basis?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Justus_89
  • 1
  • 1

1 Answers1

1

This is not a complete answer.

Stata uses a finite sample correction described in this post. I think that may get your standard errors a tad closer.

Moreover, you can learn more about the nonest/dfadj by issuing the help whatsnew9. Stata used to adjust the VCE for the within transformation when the cluster() option was specified. The cluster-robust VCE no longer adjusts unless the dfadj is specified. You may need to use the version control to replicate old estimates.

Community
  • 1
  • 1
dimitriy
  • 9,077
  • 2
  • 25
  • 50
  • Thanks for the hint. I had a look at both the sources, and tried it for my specific problem. The standard error estimates produced using the dfa adjustment from the post you referred to are equal to the default coeftest(model,vcov=vcovHC(model,type="HC1",cluster="group")), and still too large (~118 compared to ~90, with a coefficient of size 241). I figured maybe the problem is not solely due to the DF-correction but also due to how R treats the clusters in the first place, as it seems to be a special case here? But I totally ran out of any useful ideas... – Justus_89 Dec 01 '15 at 09:32