Fast interpolation of a scattered DataFrame

Question

TL;DR: Question: Is there a fast way to interpolate a scattered 2D-dataset at specific coordinates?

And if so could someone provide an example with the provided sample data and variables used from "Current Solution" (as I'm apparently to stupid to implement it myself).

Problem:

I need to interpolate (and if possible also extrapolate) a DataFrame (size = (34, 18)) of scattered data at specific coordinate points. The DataFrame stays always the same.

The interpolation need to be fast as it is done more than 10.000 times in a loop.

The coordinates at which will be interpolated are not know in advance as they change every loop.

Current Solution:

def Interpolation(a, b):

    #import external modules
    import pandas as pd
    from scipy import interpolate

    #reading .xlsx file into DataFrame
    file  = pd.ExcelFile(file_path)
    mr_df = file.parse('Model_References')
    matrix = mr_df.set_index(mr_df.columns[0])

    #interpolation at specific coordinates
    matrix = Matrix.stack().reset_index().values
    value = interpolate.griddata(matrix[:,0:2], matrix[:,2], (a, b), method='cubic')

    return(value)

This method is not acceptable for long time use as only the two lines of code under #interpolation at specific coordinates is more than 95% of the execution time.

My Ideas:

scipy.interpolate.Rbf seems like the best solution if the data needs to be interpolated and extrapolated but as to my understanding it only creates a finer mesh of the existing data and cannot output a interpolated value at specific coordinates
creating a smaller 4x4 matrix of the area around the specific coordinates (a,b) would maybe decrease the execution time per loop, but I do struggle how to use griddata with the smaller matrix. I created a 5x5 matrix with the first row and column being the indexes and the other 4x4 entries is the data with the specific coordinates in the middle. But I get a TypeError: list indices must be integers or slices, not tuple which I do not understand as I did not change anything else.

Sample Data:

          0.0     0.1     0.2     0.3
0.0      -407    -351    -294    -235
0.0001   -333    -285    -236    -185
0.0002   -293    -251    -206    -161
0.00021  -280    -239    -196    -151

Are your data points always on the same locations? Is so, the triangulation can be pre-computed, see for instance https://stackoverflow.com/q/51858194/8069403 — xdze2, Jun 13 '19 at 10:03
@xdze2 the un-interpolated matrix is always the same but the coordinates at which the interpolation needs to be done is always different(different in amount of decimal places, etc.). And if I would use that method how could I access the interpolated data at specific coordinate points? — GittingGud, Jun 13 '19 at 11:14
Create a surface from your dataframe, using whatever interpolation scheme you want, once. Then evaluate that surface at the locations of interest. If you know all the locations in advance then there is not even any need to loop - take advantage of numpy array. https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BivariateSpline.html is an example to fit a splined surface but there are plenty in scipy.interpolate. — Jdog, Jun 13 '19 at 11:15
@Jdog the locations aren't know until that specific iteration of the loop (as it is a simulation calculating each time step after another) and I do not think I can create a surface in advance as I do not know the resolution I need (because it's a simulation and the values aren't predictable) — GittingGud, Jun 13 '19 at 11:18
Taking the example of a spline surface - there is no concept of spatial 'resolution' in terms of that needed to accurately evaluate a position. You can evaluate your surface at any arbitrary position, I believe the call is something like `.ev(x,y)`. If your data frame does not change I can't foresee any reason why you would ever recalculate the interpolation surface inside the loop. — Jdog, Jun 13 '19 at 12:01

score 1 · Accepted Answer · answered Jun 13 '19 at 12:32

Thanks to @Jdog's comment I was able to figure it out:

The creation of a spline once before the loop with scipy.interpolate.RectBivariateSpline and the read out of specific coordinates with scipy.interpolate.RectBivariateSpline.ev decreased the execution time of the interpolation from 255s to 289ms.

def Interpolation(mesh, a, b):

    #interpolation at specific coordinates
    value = mesh.ev(stroke, current)

    return(value)

#%%

#import external modules
import pandas as pd
from scipy import interp

#reading .xlsx file into DataFrame
file  = pd.ExcelFile(file_path)
mr_df = file.parse('Model_References')
matrix = mr_df.set_index(mr_df.columns[0])

mesh = interp.RectBivariateSpline(a_index, b_index, matrix)

for iterations in loop:
    value = Interpolation(mesh, a, b)

Fast interpolation of a scattered DataFrame

1 Answers1