How to deal with heteroscedasticity in OLS with R

Question

I am fitting a standard multiple regression with OLS method. I have 5 predictors (2 continuous and 3 categorical) plus 2 two-way interaction terms. I did regression diagnostics using residuals vs. fitted plot. Heteroscedasticity is quite evident, which is also confirmed by bptest().

I don't know what to do next. First, my dependent variable is reasonably symmetric (I don't think I need to try transformations of my DV). My continuous predictors are also not highly skewed. I want to use weights in lm(); however, how do I know what weights to use?

Is there a way to automatically generate weights for performing weighted least squares? or Are you other ways to go about it?

I would suggest `gls()` with the `weights` argument specified, but it depends a lot on the pattern of heteroscedasticity, and *why* you want to correct for it (do you want to get correct standard errors? increase efficiency of the estimator?) Please consider including a *small* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can better understand and more easily answer your question. — Ben Bolker, Jun 24 '14 at 21:20
I think this is probably better asked on Cross Validated, since the main issues are primarily substantive and only secondarily about programming. — Thomas, Jun 24 '14 at 21:34

score 6 · Answer 1 · answered Dec 13 '16 at 12:11

One obvious way to deal with heteroscedasticity is the estimation of heteroscedasticity consistent standard errors. Most often they are referred to as robust or white standard errors.

You can obtain robust standard errors in R in several ways. The following page describes one possible and simple way to obtain robust standard errors in R:

https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r

However, sometimes there are more subtle and often more precise ways to deal with heteroscedasticity. For instance, you might encounter grouped data and find yourself in a situation where standard errors are heterogeneous in your dataset, but homogenous within groups (clusters). In this case you might want to apply clustered standard errors. See the following link to calculate clustered standard errors in R:

https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r

score 3 · Answer 2 · edited May 23 '17 at 10:31

What is your sample size? I would suggest that you make your standard errors robust to heteroskedasticity, but that you do not worry about heteroskedasticity otherwise. The reason is that with or without heteroskedasticity, your parameter estimates are unbiased (i.e. they are fine as they are). The only thing that is affected (in linear models!) is the variance-covariance matrix, i.e. the standard errors of your parameter estimates will be affected. Unless you only care about prediction, adjusting the standard errors to be robust to heteroskedasticity should be enough.

See e.g. here how to do this in R.

Btw, for your solution with weights (which is not what I would recommend), you may want to look into ?gls from the nlme package.

How to deal with heteroscedasticity in OLS with R

2 Answers2