extrapolating data with numpy/python

Question

Let's say I have a simple data set. Perhaps in dictionary form, it would look like this:

{1:5, 2:10, 3:15, 4:20, 5:25}

(the order is always ascending). What I want to do is logically figure out what the next point of data is most likely to be. In the case, for example, it would be {6: 30}

what would be the best way to do this?

possible duplicate of [How to make scipy.interpolate give an extrapolated result beyond the input range?](http://stackoverflow.com/questions/2745329/how-to-make-scipy-interpolate-give-an-extrapolated-result-beyond-the-input-range) — Yoann Quenach de Quivillic, Oct 16 '13 at 14:45
Dictionaries are unordered collections, so your "order is always ascending" remark may be a dangerous assumption, since `for key in d` will iterate over the keys however Python sees fit, not in the ordered you created them. — Jaime, Oct 16 '13 at 15:43
I meant more in terms of the data, as in each numerically higher key has a numerically higher value — corvid, Oct 16 '13 at 17:02

Daniel · Answer 1 · 2013-10-16T20:04:28.170

You can also use numpy's polyfit:

data = np.array([[1,5], [2,10], [3,15], [4,20], [5,25]])
fit = np.polyfit(data[:,0], data[:,1] ,1) #The use of 1 signifies a linear fit.

fit
[  5.00000000e+00   1.58882186e-15]  #y = 5x + 0

line = np.poly1d(fit)
new_points = np.arange(5)+6

new_points
[ 6, 7, 8, 9, 10]

line(new_points)
[ 30.  35.  40.  45.  50.]

This allows you to alter the degree of the polynomial fit quite easily as the function polyfit take thes following arguments np.polyfit(x data, y data, degree). Shown is a linear fit where the returned array looks like fit[0]*x^n + fit[1]*x^(n-1) + ... + fit[n-1]*x^0 for any degree n. The poly1d function allows you turn this array into a function that returns the value of the polynomial at any given value x.

In general extrapolation without a well understood model will have sporadic results at best.

Exponential curve fitting.

from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.exp(-b * x) + c

x = np.linspace(0,4,5)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))

fit ,cov = curve_fit(func, x, yn)
fit
[ 2.67217435  1.21470107  0.52942728]         #Variables

y
[ 3.          1.18132948  0.68568395  0.55060478  0.51379141]  #Original data

func(x,*fit)
[ 3.20160163  1.32252521  0.76481773  0.59929086  0.5501627 ]  #Fit to original + noise

thank you good sir, but if you don't mind me asking, what exactly is the 'fit' variable? As in, what does it signify? — corvid, Oct 16 '13 at 17:07
@Crowz - It's a linear model. As Ophion's comment mentions, it's `y = fit[0] * x + fit[1]`. — Joe Kington, Oct 16 '13 at 17:34
would there be a way to imply a model which follows a more exponential path? — corvid, Oct 16 '13 at 18:46
@Crowz You can always fit an exponential to this; however, exponential fitting is inherently more difficult. Please provide a complete example of what you are trying to do. — Daniel, Oct 16 '13 at 20:07

score 10 · Answer 2 · edited May 23 '17 at 12:22

10

As pointed out by this answer to a related question, as of version 0.17.0 of scipy, there is an option in scipy.interpolate.interp1d that allows linear extrapolation. In your case, you could do:

>>> import numpy as np
>>> from scipy import interpolate

>>> x = [1, 2, 3, 4, 5]
>>> y = [5, 10, 15, 20, 25]
>>> f = interpolate.interp1d(x, y, fill_value = "extrapolate")
>>> print(f(6))
30.0

edited May 23 '17 at 12:22

Community

1
1

answered Jun 30 '16 at 10:34

Noyer282

934
7
18

I'd be thrilled if the documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html actually explained how the extrapolation worked. It doesn't even mention that it's linear, let alone liner based on how many prior or post points (two suffice and would be expected, more would demand a regression). – Bernd Wechner Jun 13 '23 at 00:58

OldTinfoil · Accepted Answer · 2020-07-27T08:22:20.153

After discussing with you in the Python chat - you're fitting your data to an exponential. This should give a relatively good indicator since you're not looking for long term extrapolation.

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def exponential_fit(x, a, b, c):
    return a*np.exp(-b*x) + c

if __name__ == "__main__":
    x = np.array([0, 1, 2, 3, 4, 5])
    y = np.array([30, 50, 80, 160, 300, 580])
    fitting_parameters, covariance = curve_fit(exponential_fit, x, y)
    a, b, c = fitting_parameters
    
    next_x = 6
    next_y = exponential_fit(next_x, a, b, c)
    
    plt.plot(y)
    plt.plot(np.append(y, next_y), 'ro')
    plt.show()

The red dot in the on far right axis shows the next "predicted" point.

score 1 · Answer 4 · edited May 23 '17 at 11:44

Since your data is approximately linear you can do a linear regression, and then use the results from that regression to calculate the next point, using y = w[0]*x + w[1] (keeping the notation from the linked example for y = mx + b).

If your data is not approximately linear and you don't have some other theoretical form for a regression, then general extrapolations (using say polynomials or splines) are much less reliable as they can go a bit crazy beyond the known data points. For example, see the accepted answer here.

score 0 · Answer 5 · edited May 23 '17 at 12:14

0

Using scipy.interpolate.splrep:

>>> from scipy.interpolate import splrep, splev
>>> d = {1:5, 2:10, 3:15, 4:20, 5:25}
>>> x, y = zip(*d.items())
>>> spl = splrep(x, y, k=1, s=0)
>>> splev(6, spl)
array(30.0)
>>> splev(7, spl)
array(35.0)
>>> int(splev(7, spl))
35
>>> splev(10000000000, spl)
array(50000000000.0)
>>> int(splev(10000000000, spl))
50000000000L

See How to make scipy.interpolate give an extrapolated result beyond the input range?

edited May 23 '17 at 12:14

Community

1
1

answered Oct 16 '13 at 14:44

falsetru

357,413
63
732
636

4

Be careful with using splines to extrapolate. They tend to "overshoot" at the ends. It's very, very easy to get extrapolation estimates orders of magnitude larger or smaller than your data using splines. They're great for interpolation, but a very poor choice for extrapolation. – Joe Kington Oct 16 '13 at 14:52

djvg · Answer 6 · 2023-01-23T08:47:22.150

Here's a funny one using only numpy, in case you do not want to depend on scipy:

from numpy.polynomial.polynomial import polyfit, polyval
from numpy import interp, ndarray, piecewise


def interp1d(x: ndarray, xp, fp):
    """1D piecewise linear interpolation with linear extrapolation."""
    return piecewise(
        x,
        [x < xp[0], (x >= xp[0]) & (x <= xp[-1]), x > xp[-1]],
        [
            lambda xi: polyval(xi, polyfit(xp[:2], fp[:2], 1)),
            lambda xi: interp(xi, xp, fp),
            lambda xi: polyval(xi, polyfit(xp[-2:], fp[-2:], 1)),
        ],
    )

This uses plain numpy.interp for interpolation, reverts to a linear polynomial fit to extrapolate out-of-bounds values, and uses numpy.piecewise to string them together.

Instead of polyval(..., polyfit(...)), you could also write the linear extrapolation functions yourself, for example:

lambda xi: fp[0] + np.diff(fp[:2]) / np.diff(xp[:2]) * (xi - xp[0])

and so on.

extrapolating data with numpy/python

6 Answers6

Linked