1

This question is similar to confidence and prediction intervals with StatsModels but with an added nuance:

My data is heteroscedastic and I would like to plot the confidence interval on the mean using any one of the heteroscedastic consistent standard errors that statsmodels provides (HC0_se, HC1_se, etc.). I can't find any easy access to this information for each fitted value (though it's quite easy to get the intervals for each coefficient). It also does not seem to be contained in the results summary table in stats.outliers in the same way that the standard mean confidence interval data is.

Two questions:

  1. Does anyone have any idea how I can do this?
  2. What does one typically use the heteroscedastic-consistent covariance matrices for that are also available in the linear regression results object? Why is that made available?

Many thanks

Community
  • 1
  • 1
AllenQ
  • 1,659
  • 2
  • 16
  • 18
  • I don't really understand your second question. Is it: Why do we use heteroscedasticity robust covariances? – Josef Jan 28 '14 at 14:13
  • the second question is a novice question. as in: what would I use the hc covariance matrix for? (probably not appropriate for SO since it's not really about programming)? – AllenQ Jan 28 '14 at 14:20
  • Yes, that's more a stats.stackexchange question. I'll add a brief answer. – Josef Jan 28 '14 at 14:41

2 Answers2

3

I don't believe there's a way to specify which covariance matrix you want to use for calculation of prediction standard errors yet. Note that the prediction code is still in the "sandbox" folder in the statsmodels repository. I'm sure Github pull requests would be welcome :)

In any case, this should be pretty simple to do. Here's a link to the under-the-hood code for the prediction function that you linked to. Essentially, you would just need to substitute the covariance matrix you want to use instead of the covb variable.

Then, you can use he same matplotlib tidbit you saw in the other SO post.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/predstd.py#L27

predvar = res.mse_resid/weights + (exog * np.dot(covb, exog.T).T).sum(1)
predstd = np.sqrt(predvar)
tppf = stats.t.isf(alpha/2., res.df_resid)
interval_u = predicted + tppf * predstd
interval_l = predicted - tppf * predstd
return predstd, interval_l, interval_u
Vincent
  • 15,809
  • 7
  • 37
  • 39
2

Robust standard errors or covariances are not yet fully integrated into the models. They are currently mainly add-ons to get them after the model is estimated.

We will be able to change default covariance to any of the available robust covariance estimators in the next release of statsmodels and is already in current master for OLS. Then all additional results, t_test, wald_test and so on, will be using the robust or nonrobust covariance that has been defined as default. current version: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html

For the prediction standard errors:

I think the calculations are the same when cov_params is a robust sandwich estimator, but I haven't verified that against Stata. see the last part of my answer in Mathematical background of statsmodels wls_prediction_std

So in statsmodels 0.5 it's not possible to get the prediction errors with robust covariances directly, you need to copy the function to use the desired cov_params.

Why do we use robust covariances

If there is heteroscedasticity or correlation of observations, then OLS has consistent or unbiased parameter estimates, but the standard covariance matrix of the parameter estimates is "wrong". So we need to get a covariance matrix that is robust to heteroscedasticity, correlation or both.

Many modern econometrics textbooks recommend to always use robust covariance estimators, when we are not sure about the correct specification of heteroscedasticity or correlation across observations. Which is almost always the case in economics.

The simplest case is just heteroscedasticity http://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors but in timeseries we might have autocorrelation that we did not include in the model, in repeated measures or panel data we often have correlation within clusters or panels. Robust covariances give us consistent standard errors in these cases.

The same can apply to other models, for example cluster robust standard errors in Poisson or Logit model in generalized estimating equations (GEE).

Community
  • 1
  • 1
Josef
  • 21,998
  • 3
  • 54
  • 67
  • great explanation. i was interested in the predict_mean se, as its used in the SO post i linked. the code in statsmodels.stats.outliers_influence uses predict_mean_se = np.sqrt(infl.hat_matrix_diag*res.mse_resid). is this standard error incorrect if there is heteroscedasticity? if i wanted to adjust this mean_se for heteroscedasticity, how would I adjust the code? – AllenQ Jan 28 '14 at 15:03
  • Using the OLS hat matrix is incorrect, since it doesn't use a robust covariance. If we use `covb` directly as in `wls_prediction_std` and in Vincent's answer, then we can use whichever robust covariance is appropriate for our data as `covb` in the calculation. I guess that there is a correction to the OLS hatmatrix that could be used directly, but I never tried. – Josef Jan 28 '14 at 15:25
  • For `predict_mean` we only need the second part of predvar (in Vincent's answer) – Josef Jan 28 '14 at 15:28
  • just want to confirm I understand: using predvar = (exog * np.dot(covb, exog.T).T).sum(1) to calculate predstd and tppf, I can then calculate the predict_mean interval as: fitted_values +/- tppf*predstd, where fitted_values comes from the OLS results object (e.g.: results.fitted_values()). is this correct? – AllenQ Jan 28 '14 at 15:41