3

I have the following probit command in Stata and look for the equivalent code in R:

probit mediation viol ethniccomp  lncrisisdur  lncapratio  lnten_mean durable_avg neighbors totaldem_nbrhd geostr medprev jointdem if newcrisis==1, cluster(crisno)

I am able to replicate the estimation results for the coefficients, however, not the corrected standard errors (which are clustered)

probit.3.1_1 <- glm(mediation ~           viol+ethniccomp+lncrisisdur+lncapratio+lnten_mean+durable_avg+neighbors+
                    totaldem_nbrhd+geostr+medprev+jointdem,
                    data=as.data.frame(basedata[basedata$newcrisis==1,]), family=binomial (link=probit)) 

I am basically looking for the equivalent in R for the Stata option cluster(crisno).

I have seen this reply, but as far as I can tell the proposed solution only refers to logit, not probit.

Community
  • 1
  • 1
zoowalk
  • 2,018
  • 20
  • 33
  • 1
    http://stackoverflow.com/questions/19017828/first-difference-linear-panel-model-variance-in-r-and-stata/19057606#19057606 – Metrics Mar 07 '15 at 00:29
  • See also why robust SEs for binary response models are of no help: http://stackoverflow.com/questions/27367974/different-robust-standard-errors-of-logit-regression-in-stata-and-r – landroni Mar 20 '16 at 20:09

1 Answers1

1

I don't know an analytic solution, so I would use a block bootstrap in R using boot from the boot package.

Here's the Stata code that I will benchmark against.

cd "C:\Users\Richard\Desktop\"
use "http://www.ats.ucla.edu/stat/stata/dae/binary.dta", clear
generate group = int((_n - 1) / 20) + 1
probit admit gpa gre, vce(cluster group)
outsheet using "binary.txt", replace

And here is the equivalent in R. The second chunk provides the block bootstrap on group, which is a random clustering variable I made in Stata.

setwd("C:/Users/Richard/Desktop/")
df <- read.delim("binary.txt")

# homoskedastic
probit <- glm(admit ~ gpa + gre, data=df, family=binomial(link=probit)) 

# with block bootstrap using `boot` package
library(boot)
myProbit <- function(x, y) {
    myDf <- do.call("rbind", lapply(y, function(n) subset(df, group == x[n])))
    myModel <- glm(admit ~ gpa + gre, data=myDf, family=binomial(link=probit))
    coefficients(myModel)
}
groups <- unique(df$group)
probitBS <- boot(groups, myProbit, 500)

# comparison
summary(probit)
probitBS

They get pretty close (Stata results followed by the R block bootstrap results).

Probit regression                                 Number of obs   =        400
                                                  Wald chi2(2)    =      24.03
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood =   -240.094                 Pseudo R2       =     0.0396

                                 (Std. Err. adjusted for 20 clusters in group)
------------------------------------------------------------------------------
             |               Robust
       admit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gpa |    .454575   .1531717     2.97   0.003     .1543641    .7547859
         gre |   .0016425   .0006404     2.56   0.010     .0003873    .0028977
       _cons |  -3.003536    .520864    -5.77   0.000    -4.024411   -1.982662
------------------------------------------------------------------------------

> probitBS

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = groups, statistic = myProbit, R = 500)


Bootstrap Statistics :
        original        bias     std. error
t1* -3.003535745 -3.976856e-02 0.5420935780
t2*  0.454574799  3.781773e-03 0.1530609943
t3*  0.001642537  4.200797e-05 0.0006210689
Richard Herron
  • 9,760
  • 12
  • 69
  • 116
  • 2
    Unfortunately I believe this method to be incorrect as the inclusion of the cluster structure in the likelihood means that the baseline `probit` estimates are biased & inconsistent, see [Dave Giles' blog post](http://davegiles.blogspot.com/2013/05/robust-standard-errors-for-nonlinear.html). So we're getting incorrect coefficients at each step & comparing the distribution of incorrect coefficients to the incorrect baseline coefficient. – MichaelChirico Oct 03 '15 at 02:59
  • @MichaelChirico - Good point. I agree. Thanks for the link! (Although now I'm not sure what is the solution since the link rules out LPM, too!) – Richard Herron Oct 05 '15 at 09:02
  • this problem has been torturing me all weekend... in the end I decided the results were unbiased, I'll add some links when I get to my computer. from there, I did my own bootstrap. – MichaelChirico Oct 05 '15 at 12:34