7

Interpolating is easy in pandas using df.interpolate() is there a method in pandas that with the same elegance do something like extrapolate. I know my extrapolation is fitted to a second degree polynom.

DeanLa
  • 1,871
  • 3
  • 21
  • 37
  • You may have to use `scipy.interpolate.UnivariateSpline` which has an `ext` option. – askewchan Sep 17 '15 at 15:57
  • Related: [Extrapolate values in Pandas DataFrame](https://stackoverflow.com/questions/22491628/extrapolate-values-in-pandas-dataframe), but a simpler case that was able to be solved by another method. – askewchan Sep 17 '15 at 16:01
  • There is now an [answer](http://stackoverflow.com/a/35959909/2087463) to that question with specifics on the polynomial extrapolation. – tmthydvnprt Mar 12 '16 at 17:21

2 Answers2

2

"With the same elegance" is a somewhat tall order but this can be done. As far as I'm aware you'll need to compute the extrapolated values manually. Note it is very unlikely these values will be very meaningful unless the data you are operating on actually obey a law of the form of the interpolant.

For example, since you requested a second degree polynomial fit:

import numpy as np
t = df["time"]
dat = df["data"]
p = np.poly1d(np.polyfit(t,data,2))

Now p(t) is the value of the best-fit polynomial at time t.

AGML
  • 890
  • 6
  • 18
0

Extrapolation

See this answer for how to extrapolate the values of each column of a DataFrame with a 3rd order polynomial. A different order (e.g. 2nd order) polynomial may easily be used by altering func().

Snippet from the answer

# Function to curve fit to the data
def func(x, a, b, c, d):
    return a * (x ** 3) + b * (x ** 2) + c * x + d

# Initial parameter guess, just to kick off the optimization
guess = (0.5, 0.5, 0.5, 0.5)

# Create copy of data to remove NaNs for curve fitting
fit_df = df.dropna()

# Place to store function parameters for each column
col_params = {}

# Curve fit each column
for col in fit_df.columns:
    # Get x & y
    x = fit_df.index.astype(float).values
    y = fit_df[col].values
    # Curve fit column and get curve parameters
    params = curve_fit(func, x, y, guess)
    # Store optimized parameters
    col_params[col] = params[0]

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNs in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])
Community
  • 1
  • 1
tmthydvnprt
  • 10,398
  • 8
  • 52
  • 72