30

I would like to find the R implementation that most closely resembles Stata output for fitting a Least Squares Regression function with Heteroskedastic Corrected Standard Errors. Specifically I would like the corrected standard errors to be in the "summary" and not have to do additional calculations for my initial round of hypothesis testing. I am looking for a solution that is as "clean" as what Eviews and Stata provide.

So far, using the "lmtest" package the best I can come up with is:

model <- lm(...)
coeftest(model, vcov = hccm) 

This gives me the output that I want, but it does not seem to be using "coeftest" for its stated purpose. I would also have to use the summary with the incorrect standard errors to read off the R^2 and F stat, etc. I feel that there should exist a "one line" solution to this problem given how dynamic R is.

Thanks

Metrics
  • 15,172
  • 7
  • 54
  • 83
JJJ
  • 597
  • 2
  • 7
  • 14
  • You should note that you need package car also for `hccm()`. It took me a few minutes to work out where that was coming from. – Gavin Simpson Dec 08 '10 at 09:27

3 Answers3

39

I think you are on the right track with coeftest in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found.

> # generate linear regression relationship
> # with Homoskedastic variances
> x <- sin(1:100)
> y <- 1 + x + rnorm(100)
> ## model fit and HC3 covariance
> fm <- lm(y ~ x)
> vcovHC(fm)
            (Intercept)           x
(Intercept) 0.010809366 0.001209603
x           0.001209603 0.018353076
> coeftest(fm, vcov. = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To get the F test, look at function waldtest():

> waldtest(fm, vcov = vcovHC)
Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You could always cook up a simple function to combine these two for you if you wanted the one-liner...

There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want.

Edit: A one-liner could be as simple as:

mySummary <- function(model, VCOV) {
    print(coeftest(model, vcov. = VCOV))
    print(waldtest(model, vcov = VCOV))
}

Which we can use like this (on the examples from above):

> mySummary(fm, vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
rafaelvalle
  • 6,683
  • 3
  • 34
  • 36
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 7
    @Jerome: If you are happy with an answer, upvote or accept it (that's how things work here on SO). OK, you pay me what you paid for Stata and I'll write the one-liner for you ;-) Seriously, R is developed by volunteers and individuals. These people develop code they need to use and do it how they feel it should be done. `lmtest` is about providing targeted tests of linear models. The authors probably felt it better to provide the raw tools to do these things and didn't have need of the summary-sugar you want. I'll show you how easy it is to write a one-line wrapper in an edit to my Answer. – Gavin Simpson Dec 08 '10 at 13:57
  • any idea what STATA does here, cause I was trying to reproduce results somebody else calculated with STATA's `robust` option. – Matt Bannert May 13 '13 at 15:07
  • 3
    Ah, I can answer my own question I here (the one I raised on the previous comment): The equivalent to what STATA does would be to use `typ="HC1"` with vcovHC. – Matt Bannert May 13 '13 at 15:15
  • 1
    A brief explanation why `lmtest` does not provide `mysummary(..., robust = TRUE)`. First, "robust" is frequently misinterpreted - it's just a very specific type of robustness. Second, the inevitable question would be: What is the "residual standard error" and "R-squared" if you modify the covariance matrix estimate? Answer: Not so clear. You explicitly account for heteroscedasticity so that there is not _one_ residual variance. Similarly, the decomposition of the sums of squares does not work "as usual" for the R-squared. Hence users have to ask specifically for a `coeftest()` or `waldtest()`. – Achim Zeileis Jun 21 '17 at 08:12
10

I found an R function that does exactly what you are looking for. It gives you robust standard errors without having to do additional calculations. You run summary() on an lm.object and if you set the parameter robust=T it gives you back Stata-like heteroscedasticity consistent standard errors.

summary(lm.object, robust=T)

You can find the function on https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r/

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Sandra Lopez
  • 117
  • 1
  • 4
  • 1
    This does not work without loading an external script provided by the author which changes the summary function. Without loading that script, R simply ignores the robust variable and outputs the standard heteroscedasticity inconsistent standard errors. – ifly6 Apr 03 '18 at 18:22
2

There is now a one-line solution using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).

> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, se_type = "stata")
> summary(lmro)

Call:
lm_robust(formula = mpg ~ hp, data = mtcars, se_type = "stata")

Standard error type:  HC1 

Coefficients:
            Estimate Std. Error  Pr(>|t|) CI Lower CI Upper DF
(Intercept) 30.09886    2.07661 4.348e-15 25.85785 34.33987 30
hp          -0.06823    0.01356 2.132e-05 -0.09592 -0.04053 30

Multiple R-squared:  0.6024 ,   Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

You can also get tidy output:

> tidy(lmro)
         term    estimate std.error      p.value    ci.lower
1 (Intercept) 30.09886054 2.0766149 4.347723e-15 25.85784704
2          hp -0.06822828 0.0135604 2.131785e-05 -0.09592231
     ci.upper df outcome
1 34.33987404 30     mpg
2 -0.04053425 30     mpg

The "stata" standard errors default to "HC1" standard errors, which are the default rob standard errors in Stata. You can also get "classical", "HC0", "HC1", "HC2", "HC3" and various clustered standard errors as well (including those that match Stata).

luke.sonnet
  • 475
  • 2
  • 11