11

From what I understand polynomial regression is a specific type of regression analysis, which is more complicated than linear regression. Is there a python module which can do this? I have looked in matplotlib, scikit and numpy but can only find linear regression analysis.

And it is possible to work out the correlation coefficient of a non-linear line?

pfabri
  • 885
  • 1
  • 9
  • 25
astrochris
  • 1,756
  • 5
  • 20
  • 42

3 Answers3

16

Have you had a look at NumPy's polyfit? See reference.

From their examples:

>>> import numpy as np
>>> x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
>>> y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
>>> z = np.polyfit(x, y, 3)
>>> z
[ 0.08703704 -0.81349206  1.69312169 -0.03968254]
adrianus
  • 3,141
  • 1
  • 22
  • 41
13

scikit supports linear and polynomial regression.

Check the Generalized Linear Models page at section Polynomial regression: extending linear models with basis functions.

Example:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> import numpy as np
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> poly = PolynomialFeatures(degree=2)
>>> poly.fit_transform(X)
array([[ 1,  0,  1,  0,  0,  1],
       [ 1,  2,  3,  4,  6,  9],
       [ 1,  4,  5, 16, 20, 25]])

The features of X have been transformed from [x_1, x_2] to [1, x_1, x_2, x_1^2, x_1 x_2, x_2^2], and can now be used within any linear model.

This sort of preprocessing can be streamlined with the Pipeline tools. A single object representing a simple polynomial regression can be created and used as follows:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.pipeline import Pipeline
>>> model = Pipeline([('poly', PolynomialFeatures(degree=3)),
...                   ('linear', LinearRegression(fit_intercept=False))])
>>> # fit to an order-3 polynomial data
>>> x = np.arange(5)
>>> y = 3 - 2 * x + x ** 2 - x ** 3
>>> model = model.fit(x[:, np.newaxis], y)
>>> model.named_steps['linear'].coef_
array([ 3., -2.,  1., -1.])

The linear model trained on polynomial features is able to exactly recover the input polynomial coefficients.

In some cases it’s not necessary to include higher powers of any single feature, but only the so-called interaction features that multiply together at most d distinct features. These can be gotten from PolynomialFeatures with the setting interaction_only=True.

fferri
  • 18,285
  • 5
  • 46
  • 95
  • I see, i dont fully understand. in the least square i can get a correlation coefficient, which can ell me the quality of the fit. can I get this number from the code above? – astrochris Jul 14 '15 at 13:07
  • I think you can always compute the [Correlation Coefficient](http://mathworld.wolfram.com/CorrelationCoefficient.html) – fferri Jul 14 '15 at 13:11
  • Check this out http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html – fferri Jul 14 '15 at 13:15
  • thanks, but the problem is these are for linear relations, and i have a second/third order realtion – astrochris Jul 14 '15 at 13:18
  • @astrochris: in all cases, the regression itself is "linear", as the coefficients are linear. The scikit-learn method explicitly shows this: you are doing linear regression over an expanded version of your data that includes the products of features as new features. – Andreus Jul 14 '15 at 22:10
1

You can first make your polynomial features using PolynomialFeatures from sklearn and then use your linear model.

Function bellow can be used for predictions of a trained model.

from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2)

lm_polyfeats = linear_model.LinearRegression()
lm_polyfeats.fit(poly.fit_transform(array2D),targetArray)

def LM_polynomialFeatures_2Darray(lm_polyfeats,array2D):
    array2D=poly.fit_transform(array2D)
    return(lm.predict(array2D))

p=LM_polynomialFeatures_2Darray(lm_polyfeats,array2D)
Ioannis Nasios
  • 8,292
  • 4
  • 33
  • 55