4

I used the statsmodels package to estimate my OLS regression. Now I want Breusch Pagan test. I used the pysal package for this test but this function returns an error:

import statsmodels.api as sm
import pysal

model = sm.OLS(Y,X,missing = 'drop')
rs = model.fit()
pysal.spreg.diagnostics.breusch_pagan(rs)

returned error:

AttributeError: 'OLSResults' object has no attribute 'u'

What should I do?

seaotternerd
  • 6,298
  • 2
  • 47
  • 58
Eghbal
  • 3,892
  • 13
  • 51
  • 112
  • 1
    I don't know about the exception, but you could also use breushpagan from statsmodels, which takes OLS residuals and candidates for explanatory variables for the heteroscedasticity, http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.diagnostic.het_breushpagan.html with examples here http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/regression_diagnostics.html – Josef May 05 '15 at 19:13
  • Thank you. Please add your answer. – Eghbal May 05 '15 at 20:10

2 Answers2

6

The problem is that the regression results instance of statsmodels is not compatible with the one in pysal.

You can use breuschpagan from statsmodels, which takes OLS residuals and candidates for explanatory variables for the heteroscedasticity and so it does not rely on a specific model or implementation of a model.

documentation: https://www.statsmodels.org/devel/generated/statsmodels.stats.diagnostic.het_breuschpagan.html

with examples here https://www.statsmodels.org/devel/examples/notebooks/generated/regression_diagnostics.html

I do not know if there are any essential differences in the implementation of the Breusch-Pagan test.

It looks like the name is misspelled in statsmodels.

edit The spelling of the name has been corrected in statsmodels version 0.9. The old incorrect spelling was breushpagan.

Josef
  • 21,998
  • 3
  • 54
  • 67
  • Seems the old links have broken. This blog: https://www.statology.org/breusch-pagan-test-python/ tells about the test, how to implement it using statsmodel in a verbose and explicit manner. – raghavsikaria Aug 29 '20 at 12:00
  • I fixed the links and adjusted to the corrected spelling of `breuschpagan` which was missing a "c" – Josef Aug 29 '20 at 13:25
2

As I tend not to use the statsmodels library, I have created a Python function to perform the Breusch-Pagan test. It uses multiple linear regression from SciKit-learn.

import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import chisqprob


def breusch_pagan_test(x, y):
    '''
    Breusch-Pagan test for heteroskedasticity in a linear regression model:
    H_0 = No heteroskedasticity.
    H_1 = Heteroskedasticity is present.

    Inputs:
    x = a numpy.ndarray containing the predictor variables. Shape = (nSamples, nPredictors).
    y = a 1D numpy.ndarray containing the response variable. Shape = (nSamples, ).

    Outputs a list containing three elements:
    1. the Breusch-Pagan test statistic.
    2. the p-value for the test.
    3. the test result.
    '''

    if y.ndim != 1:
        raise SystemExit('Error: y has more than 1 dimension.')
    if x.shape[0] != y.shape[0]:
        raise SystemExit('Error: the number of samples differs between x and y.')
    else:
        n_samples = y.shape[0]

    # fit an OLS linear model to y using x:
    lm = LinearRegression()
    lm.fit(x, y)

    # calculate the squared errors:
    err = (y - lm.predict(x))**2

    # fit an auxiliary regression to the squared errors:
    # why?: to estimate the variance in err explained by x
    lm.fit(x, err)
    pred_err = lm.predict(x)
    del lm

    # calculate the coefficient of determination:
    ss_tot = sum((err - np.mean(err))**2)
    ss_res = sum((err - pred_err)**2)
    r2 = 1 - (ss_res / ss_tot)
    del err, pred_err, ss_res, ss_tot

    # calculate the Lagrange multiplier:
    LM = n_samples * r2
    del r2

    # calculate p-value. degrees of freedom = number of predictors.
    # this is equivalent to (p - 1) parameter restrictions in Wikipedia entry.
    pval = chisqprob(LM, x.shape[1])

    if pval < 0.01:
        test_result = 'Heteroskedasticity present at 99% CI.'
    elif pval < 0.05:
        test_result = 'Heteroskedasticity present at 95% CI.'
    else:
        test_result = 'No significant heteroskedasticity.'
    return [LM, pval, test_result]
  • 3
    Just to note, the `chisqprob` is deprecated, and `scipy.stats.distributions.chi2.sf` is the suggested alternative : https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.chisqprob.html – Elena Viter Sep 16 '19 at 22:01
  • If x is list of graphs represented by id, nbr_of_edges, nbr_of_nodes & y is a computed index on the graph, like connectivity index, ... is that valid input? – sAguinaga Jul 07 '20 at 23:21
  • Hi sAguinaga, I'm sorry for the late reply. Its been a while since I wrote this, but "x" is just a numpy array containing the predictor variables in a linear regression. For example, if you have 100 training samples (i.e. unique data points) and each has 5 predictor variables, x.shape=(100, 5). –  Mar 16 '21 at 15:14
  • 1
    Small correction: the Breusch-Pagan test regresses the normalized errors onto the predictors. So `resid**2 / mean(resid**2)` instead of just `resid**2`. This is what they refer to as `g` in the original paper. – jds Jan 27 '22 at 13:34
  • Thank you for the correction. I was unaware of this. –  Jan 27 '22 at 16:07