2

I am trying to vectorize my code and have reached a roadblock. I have :

  • nxd array of x values [[x1],[...],[xn]] (where each row [x1] has many points [x11, ..., x1d]
  • nxd array of y values [[y1],[y2],[y3]] (where each row [y1] has many points [y11, ..., y1d]
  • nx1 array of x' values [[x'1],[...],[x'n]] that I would like to interpolate a y value for based on the corresponding row of x and y

The only thing I can think to use is a list comprehension like [np.interp(x'[i,:], x[i,:], y[i,:]) for i in range(n)]. I'd like a faster vectorized option if one exists. Thanks for the help!

mfgeng
  • 89
  • 5
  • 1
    Perhaps you can use some kind of multidimensional interpolation, have you tried any of [these methods](https://docs.scipy.org/doc/scipy/reference/interpolate.html#multivariate-interpolation)? – Lith Mar 09 '21 at 18:52
  • That's a good idea, thanks @Lith. So to clarify, I would use a 2D interpolation function, but just never interpolate between the rows? So make sure the x values of my x,y query points exactly match the x values used to create the interpolation grid? – mfgeng Mar 29 '21 at 04:39

1 Answers1

1

This is hardly an answer, but I guess it may still be useful for someone (if not, feel free to delete this); and by the way, I think I misunderstood your question at first. What you have is a collection of n different one-dimensional datasets or functions y(x) that you want to interpolate (correct me otherwise).

As such, it turns out doing this by multidimensional interpolation is a terrible approach. The idea I thought is to add a new dimension to the data so your datasets are mapped into one single dataset in which this new dimension is what distinguishes between the different xi, where i=1,2,...,n. In other words, you assign a value in this new dimension, let's say, z, to every row of x; this way, different functions are correctly mapped to this higher-dimensional space.
However, this approach is slower than the np.interp list comprehension solution, at least one order of magnitude in my computer. I guess it has to do with two-dimensional interpolation algorithms being at best of order O(nlog(n)) (this is a guess); in this sense, it would seem more efficient to perform multiple interpolations to different datasets rather than one big interpolation.

Anyways, the approach is shown in the following snippet:

import numpy as np
from scipy.interpolate import LinearNDInterpolator

def vectorized_interpolation(x, y, xq):
    """
    Vectorized option using LinearNDInterpolator
    """
    # Dummy new data points in added dimension
    z = np.arange(x.shape[0])
    # We must repeat every z value for every row of x
    interpolant = LinearNDInterpolator(list(zip(x.ravel(), np.repeat(z, x.shape[1]))), y.ravel())

    return interpolant(xq, z)

def non_vectorized_interpolation(x, y, xq):
    """
    Your non-vectorized solution
    """
    return np.array([np.interp(xq[i], x[i], y[i]) for i in range(x.shape[0])])

if __name__ == "__main__":
    n, d = 100, 500
    x = np.linspace(0, 2*np.pi, n*d).reshape((n, d))
    y = np.sin(x)
    xq = np.linspace(0, 2*np.pi, n)
    
    yq1 = vectorized_interpolation(x, y, xq)
    yq2 = non_vectorized_interpolation(x, y, xq)

The only advantage of the vectorized solution is that LinearNDInterpolator (and some of the other scipy.interpolate functions) explicitly calculates the interpolant, so you can reuse it if you plan on interpolating the same datasets several times and avoid repetitive calculations. Another thing you could try is using multiprocessing if you have several cores in your machine, but this is not vectorizing which is what you asked for. Sorry I can't be of more help.

Lith
  • 803
  • 8
  • 14
  • 1
    Thanks for the help! I modified your code with options for using interp2d or this Inverse Distance KDTree method ([link](https://stackoverflow.com/questions/3104781/inverse-distance-weighted-idw-interpolation-with-python/3119544)) as well. While the KDTree gave some time improvement, the non-vectorized option was still the fastest by more than an order of magnitude. This further supports your conclusion. – mfgeng Apr 07 '21 at 23:42