5

I have some data like

arr = [
    [30.0, 0.0257],
    [30.0, 0.0261],
    [30.0, 0.0261],
    [30.0, 0.026],
    [30.0, 0.026],
    [35.0, 0.0387],
    [35.0, 0.0388],
    [35.0, 0.0387],
    [35.0, 0.0388],
    [35.0, 0.0388],
    [40.0, 0.0502],
    [40.0, 0.0503],
    [40.0, 0.0502],
    [40.0, 0.0498],
    [40.0, 0.0502],
    [45.0, 0.0582],
    [45.0, 0.0574],
    [45.0, 0.058],
    [45.0, 0.058],
    [45.0, 0.058],
    [50.0, 0.0702],
    [50.0, 0.0702],
    [50.0, 0.0698],
    [50.0, 0.0704],
    [50.0, 0.0703],
    [55.0, 0.0796],
    [55.0, 0.0808],
    [55.0, 0.0803],
    [55.0, 0.0805],
    [55.0, 0.0806],
]

which is plotted like

in Google Charts API

I am trying to do linear regression on this, i.e. trying to find the slope and the (y-) intercept of the trend line, and also the uncertainty in slope and uncertainty in intercept.

The Google Charts API already finds the slope and the intercept value when I draw the trend line, but I am not sure how to find the uncertainties.

I have been doing this using LINEST function in Excel, but I find this very cumbersome, since all my data are in Python.

So my question is, how can I find the two uncertainty values that I get in LINEST using Python?

I apologize for asking an elementary question like this.

I am pretty good at Python and Javascript, but I am very poor at regression analysis, so when I tried to look them up in documentations, because of the difficult terms, I got very confused.

I hope to use some well-known Python library, although it would be ideal if I could do this within Google Charts API.

Eric
  • 2,635
  • 6
  • 26
  • 66
  • I think this might help you http://stackoverflow.com/questions/11479064/multivariate-linear-regression-in-python/14971531#14971531 – Akavall Sep 26 '14 at 01:58
  • I am an absolute novice when it comes to regression or any statistical methods. Unfortunately, the link does not help. Sorry. – Eric Sep 26 '14 at 02:08

1 Answers1

3

It could be done using statsmodels like this:

import statsmodels.api as sm
import numpy as np


y=[];x=[]
for item in arr:
    x.append(item[0])
    y.append(item[1])

# include constant in ols models, which is not done by default
x = sm.add_constant(x)

model = sm.OLS(y,x)
results = model.fit()

You could then access the values you require as follows. The intercept and the slope are given by:

results.params # linear coefficients
# array([-0.036924 ,  0.0021368])

I suppose you mean the standard errors when you refer to uncertainty, they can be accessed like this:

results.bse # standard errors of the parameter estimates
# array([  1.03372221e-03,   2.38463106e-05])

An overview can be obtained by running

>>> print results.summary()
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.997
Model:                            OLS   Adj. R-squared:                  0.996
Method:                 Least Squares   F-statistic:                     8029.
Date:                Fri, 26 Sep 2014   Prob (F-statistic):           5.61e-36
Time:                        05:47:08   Log-Likelihood:                 162.43
No. Observations:                  30   AIC:                            -320.9
Df Residuals:                      28   BIC:                            -318.0
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0.0369      0.001    -35.719      0.000        -0.039    -0.035
x1             0.0021   2.38e-05     89.607      0.000         0.002     0.002
==============================================================================
Omnibus:                        7.378   Durbin-Watson:                   0.569
Prob(Omnibus):                  0.025   Jarque-Bera (JB):                2.079
Skew:                           0.048   Prob(JB):                        0.354
Kurtosis:                       1.714   Cond. No.                         220.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This might also be of interest for a summary of the properties of the resulting model.

I did not compare to LINESTin Excel. I also don't know if this is possible using only the Google Charts API.