2

I am quite new to R and used to pretty basic application. Now I have encountered a problem I need help with:

I am looking for a way to cluster standard errors for an ordered logistic regression (my estimation is similar to this example)

I already tried robcov and vcovCL and they give me similar error messages:

  • Error in meatCL(x, cluster = cluster, type = type, ...) : number of observations in 'cluster' and 'estfun()' do not match
  • Error in u[, ii] <- ui : number of items to replace is not a multiple of replacement length

Many thanks in advance!

Edit: I found some more information about the missing values but that does not seem to be the problem - because it persists even if I work around it using this answer, or when use a dataset without NA's. Just as in the example below. The problem seems to be that polr does not give me the residuals as part of the output. How can I work around this?

 dat <- read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta")
    length(dat$apply)
    twenty <- seq(from=1, to=20, by=1)
    dat$clustervar<-sample(twenty, size=400, replace=TRUE)



    m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
    vcovCL <- function(x, cluster.by, type="sss", dfcw=1){
      # R-codes (www.r-project.org) for computing
      # clustered-standard errors. Mahmood Arai, Jan 26, 2008.

      # The arguments of the function are:
      # fitted model, cluster1 and cluster2
      # You need to install libraries `sandwich' and `lmtest'

      # reweighting the var-cov matrix for the within model
      require(sandwich)
      cluster <- cluster.by
      M <- length(unique(cluster))
      N <- length(cluster)
      stopifnot(N == length(x$residuals))
      K <- x$rank
      ##only Stata small-sample correction supported right now
      ##see plm >= 1.5-4
      stopifnot(type=="sss")
      if(type=="sss"){
        dfc <- (M/(M-1))*((N-1)/(N-K))
      }
      uj  <- apply(estfun(x), 2, function(y) tapply(y, cluster, sum))
      mycov <- dfc * sandwich(x, meat=crossprod(uj)/N) * dfcw
      return(mycov)
    }
    vcovCL(dat, m, dat$clustervar)

This gives me:

Error: N == length(x$residuals) is not TRUE
Called from: vcovCL(dat, m, dat$clustervar)
  • There are many implementations of "ordered logistic regression" in R. You are requested to [edit] your question to include code that loads any require libraries above the base packages and accessed a dataset (perhaps one of the examples in one of those packages) and then attempts to run an analysis. – IRTFM Dec 13 '17 at 17:00
  • (I answered the other question and tried to close this less well presented question.) And yes, missingness will screw up the estimation at least for vcocCL. Take out the missing data for the regression would seem to be the sensible approach. – IRTFM Dec 13 '17 at 17:36
  • Thank you for your comment. I tried to illustrate the question more properly. Also please excuse that I overlooked the answer to the other question. I must have meant to say that it was not the answer I was looking for. – OuVaLeMonde Dec 14 '17 at 16:12
  • The link to the original similar question is now missing but you can get back to it by looking at the edit trail. You should see that the vcovCL function takes a model object as its first object. – IRTFM Dec 14 '17 at 22:16

1 Answers1

3

I had success following the help page for ?sandwich::vcovCL which shows that the first arg to the function is a model object. Needed to use the :: operator to mask the function you offered:

 m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
 ( clval <- sandwich::vcovCL(m, dat$clustervar) )
                                  pared       public         gpa unlikely|somewhat likely
pared                       0.085218306  0.005588259  0.04584255               0.15545404
public                      0.005588259  0.092283173 -0.01890725              -0.05875859
gpa                         0.045842552 -0.018907254  0.07067573               0.22455931
unlikely|somewhat likely    0.155454041 -0.058758588  0.22455931               0.72408670
somewhat likely|very likely 0.165079639 -0.058282514  0.23631756               0.75713049
                            somewhat likely|very likely
pared                                        0.16507964
public                                      -0.05828251
gpa                                          0.23631756
unlikely|somewhat likely                     0.75713049
somewhat likely|very likely                  0.80749182

You may need to use the diag of that matrix if you want Wald tests. I think that is what coeftest will deliver:

coeftest( m, vcov = clval)

t test of coefficients:

        Estimate Std. Error t value  Pr(>|t|)    
pared   1.047690   0.291922  3.5889 0.0003738 ***
public -0.058786   0.303781 -0.1935 0.8466565    
gpa     0.615941   0.265849  2.3169 0.0210210 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The other question that prompted the successful search of Rhelp and finding the answer by Achim Zeileis is here

IRTFM
  • 258,963
  • 21
  • 364
  • 487