How do you fit a polynomial to a data set?

Question

I'm working on two functions. I have two data sets, eg [[x(1), y(1)], ..., [x(n), y(n)]], dataSet and testData.

createMatrix(D, S) which returns a data matrix, where D is the degree and S is a vector of real numbers [s(1), s(2), ..., s(n)].

I know numpy has a function called polyfit. But polyfit takes in three variables, any advice on how I'd create the matrix?

polyFit(D), which takes in the polynomial of degree D and fits it to the data sets using linear least squares. I'm trying to return the weight vector and errors. I also know that there is lstsq in numpy.linag that I found in this question: Fitting polynomials to data

Is it possible to use that question to recreate what I'm trying?

This is what I have so far, but it isn't working.

def createMatrix(D, S):
  x = []
  y = []
  for i in dataSet:
    x.append(i[0])
    y.append(i[1])
  polyfit(x, y, D)

What I don't get here is what does S, the vector of real numbers, have to do with this?

def polyFit(D)

I'm basing a lot of this on the question posted above. I'm unsure about how to get just w though, the weight vector. I'll be coding the errors, so that's fine I was just wondering if you have any advice on getting the weight vectors themselves.

score 1 · Accepted Answer · answered Oct 05 '17 at 02:58

It looks like all createMatrix is doing is creating the two vectors required by polyfit. What you have will work, but, the more pythonic way to do it is

def createMatrix(dataSet, D):
    D = 3  # set this to whatever degree you're trying
    x, y = zip(*dataSet)
    return polyfit(x, y, D)

(This S/O link provides a detailed explanation of the zip(*dataSet) idiom.)

This will return a vector of coefficients that you can then pass to something like poly1d to generate results. (Further explanation of both polyfit and poly1d can be found here.)

Obviously, you'll need to decide what value you want for D. The simple answer to that is 1, 2, or 3. Polynomials of higher order than cubic tend to be rather unstable and the intrinsic errors make their output rather meaningless.

It sounds like you might be trying to do some sort of correlation analysis (i.e., does y vary with x and, if so, to what extent?) You'll almost certainly want to just use linear (D = 1) regression for this type of analysis. You can try to do a least squares quadratic fit (D = 2) but, again, the error bounds are probably wider than your assumptions (e.g. normality of distribution) will tolerate.

Thank you, that's perfect! Any advice on my own polyFit? – Andrew Raleigh Oct 05 '17 at 14:41 — Andrew Raleigh, Oct 05 '17 at 14:41

How do you fit a polynomial to a data set?

1 Answers1