numpy polyfit passing through 0

Question

Suppose I have x and y vectors with a weight vector wgt. I can fit a cubic curve (y = a x^3 + b x^2 + c x + d) by using np.polyfit as follows:

y_fit = np.polyfit(x, y, deg=3, w=wgt)

Now, suppose I want to do another fit, but this time, I want the fit to pass through 0 (i.e. y = a x^3 + b x^2 + c x, d = 0), how can I specify a particular coefficient (i.e. d in this case) to be zero?

Thanks

farenorth · Accepted Answer · 2015-08-28T00:09:43.003

You can use np.linalg.lstsq and construct your coefficient matrix manually. To start, I'll create the example data x and y, and the "exact fit" y0:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(100)
y0 = 0.07 * x ** 3 + 0.3 * x ** 2 + 1.1 * x
y = y0 + 1000 * np.random.randn(x.shape[0])

Now I'll create a full cubic polynomial 'training' or 'independent variable' matrix that includes the constant d column.

XX = np.vstack((x ** 3, x ** 2, x, np.ones_like(x))).T

Let's see what I get if I compute the fit with this dataset and compare it to polyfit:

p_all = np.linalg.lstsq(X_, y)[0]
pp = np.polyfit(x, y, 3)

print np.isclose(pp, p_all).all()
# Returns True

Where I've used np.isclose because the two algorithms do produce very small differences.

You're probably thinking 'that's nice, but I still haven't answered the question'. From here, forcing the fit to have a zero offset is the same as dropping the np.ones column from the array:

p_no_offset = np.linalg.lstsq(XX[:, :-1], y)[0]  # use [0] to just grab the coefs

Ok, let's see what this fit looks like compared to our data:

y_fit = np.dot(p_no_offset, XX[:, :-1].T)

plt.plot(x, y0, 'k-', linewidth=3)
plt.plot(x, y_fit, 'y--', linewidth=2)
plt.plot(x, y, 'r.', ms=5)

This gives this figure,

WARNING: When using this method on data that does not actually pass through (x,y)=(0,0) you will bias your estimates of your output solution coefficients (p) because lstsq will be trying to compensate for that fact that there is an offset in your data. Sort of a 'square peg round hole' problem.

Furthermore, you could also fit your data to a cubic only by doing:

p_ = np.linalg.lstsq(X_[:1, :], y)[0]

Here again the warning above applies. If your data contains quadratic, linear or constant terms the estimate of the cubic coefficient will be biased. There can be times when - for numerical algorithms - this sort of thing is useful, but for statistical purposes my understanding is that it is important to include all of the lower terms. If tests turn out to show that the lower terms are not statistically different from zero that's fine, but for safety's sake you should probably leave them in when you estimate your cubic.

Best of luck!

thanks for the detailed answer. I also learned a new function `np.isclose` that can be useful to me in other contexts — uday, Aug 28 '15 at 00:11

score 5 · Answer 2 · edited Aug 21 '21 at 19:41

5

You can try something like the following:

Import curve_fit from scipy, i.e.

from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np

Define the curve fitting function. In your case,

def fit_func(x, a, b, c):
    # Curve fitting function
    return a * x**3 + b * x**2 + c * x  # d=0 is implied

Perform the curve fitting,

# Curve fitting
params = curve_fit(fit_func, x, y)
[a, b, c] = params[0]
x_fit = np.linspace(x[0], x[-1], 100)
y_fit = a * x_fit**3 + b * x_fit**2 + c * x_fit

Plot the results if you please,

plt.plot(x, y, '.r')         # Data
plt.plot(x_fit, y_fit, 'k')  # Fitted curve

It does not answer the question in the sense that it uses numpy's polyfit function to pass through the origin, but it solves the problem.

Hope someone finds it useful :)

edited Aug 21 '21 at 19:41

Chang Ye

1,105
1
12
25

answered Jul 09 '20 at 09:02

Gerhard

59
1
3

Nice solution! `y_fit` must be calculated from `x_fit`, not `x`. It can be even calculated in a more pythonic way like: `y_fit = fit_func(x_fit, *params[0])`, where the star operator `*` unpacks the array, but your solution is of course very easy to follow. – Marcelo Bergweiler Feb 16 '21 at 19:26
Thanks :) Thank you for that correction, @marcelo. It should indeed be `x_fit`. Excuse the error. – Gerhard Feb 22 '21 at 12:15

numpy polyfit passing through 0

2 Answers2

Linked