3

I know this can be quite confusing, so please let me know if this explanation needs some editing.

Let's say that I have input data in this format:

for a given pressure p_0 --> 2x2 grid of temperatures (T_0) that refer to this pressure value

for a given pressure p_1 --> 2x2 grid of temperatures (T_1) that refer to this pressure value

p_0 = 0
T_0 = np.array([[1, 4], [3, 2]])

p_1 = 1
T_1 = np.array([[1, 6], [4, 4]])

p = np.array([p_0, p_1])
T = np.array([T_0, T_1])

Now, I am given a 2x2 grid of new pressure values

p_target = np.array([[0.1, 0.4], [0.3, 0.2]])

and I would like to get a 2x2 grid of interpolated temperatures values, using the input data.

The way I'm doing this is for each point of the grid, I build an interpolation function and then I use it to get the new interpolated temperature value for that grid point:

from scipy.interpolate import interp1d

T_new = np.empty(p_target.shape)

for ix,iy in np.ndindex(p_target.shape):
    f = interp1d(p, T[:,ix,iy])
    T_new[ix,iy] = f(p_target[ix,iy])

T_new

array([[1. , 4.8],
       [3.3, 2.4]])

As it's easy to guess, this is quite slow for large arrays, and it seems to be quite against the numpy way of doing things.

EDIT: I'm using interp1d also because it allows for extrapolation too, which is an option I'd like to keep.

duff18
  • 672
  • 1
  • 6
  • 19
  • What is T? Can you show us a few more input-output pairs? – hkchengrex Sep 01 '20 at 09:34
  • Could you tell us about the format of this variables. If I do understand you have one feature (pressure) and 4 targets (temperatures) and you want to linearly interpolate new temperatures from pressure using the two points provided as `p` and `T`. And your main issue is a performance issue. Have I got it properly? – jlandercy Sep 01 '20 at 09:34
  • 1
    Could you provide us a different sized `p_target` to understand what would be a greater matrix. Also, is that mandatory to keep this structure? I have the feeling things are more complicated than it needs to. – jlandercy Sep 01 '20 at 09:51
  • @hkchengrex T represents the temperature. I have edited the code to make the first part clearer and less ambiguous. I'm not sure about what you mean by the input-output pairs. If the interpolation function is `f(x,y)` then `x->p`, `y->T`, and my `x_new` is `p_target`, since the goal is to run `f(x_new)`. @jlandercy I think you got it right. The performance problem begins when instead of having 2x2 matrixes for `T_0`, `T_1` and `p_target` you have something much larger like 1000x1000. Not sure about what you mean by "structure", honestly all I care about is to get to that final result – duff18 Sep 01 '20 at 10:31
  • Oh and in the real life problem `len(p)` is much larger too – duff18 Sep 01 '20 at 10:38
  • In you actual problem, do you have more than two `T` input values (and if so, are those evenly spaces between a [0, 1] domain)? Do you need to do this for a single `p_target` at a time, or potentially for many? – jdehesa Sep 02 '20 at 10:52
  • @jdehesa yes more than two `T`, no not evenly spaced, potentially for many `p_target` – duff18 Sep 02 '20 at 15:30

2 Answers2

2

You can just compute the interpolation yourself. Here I assume you have more than two T values and that p is not necessarily evenly-spaced. Also, the code assumes a you have several p_target values, but obviously works for just a single one.

import numpy as np

p_0 = 0
T_0 = np.array([[1., 4.], [3., 2.]])
p_1 = 1
T_1 = np.array([[1., 6.], [4., 4.]])
p = np.array([p_0, p_1])
T = np.array([T_0, T_1])
p_target = np.array([[0.1, 0.4], [0.3, 0.2]])
# Assume you may have several of p_target values
p_target = np.expand_dims(p_target, 0)

# Find the base index for each interpolated value (assume p is sorted)
idx_0 = (np.searchsorted(p, p_target) - 1).clip(0, len(p) - 2)
# And the next index
idx_1 = idx_0 + 1
# Get p values for each interpolated value
a = p[idx_0]
b = p[idx_1]
# Compute interpolation factor
alpha = ((p_target - a) / (b - a)).clip(0, 1)
# Get interpolation values
v_0 = np.take_along_axis(T, idx_0, axis=0)
v_1 = np.take_along_axis(T, idx_1, axis=0)
# Compute interpolation
out = (1 - alpha) * v_0 + alpha * v_1
print(out)
# [[[1.  4.8]
#   [3.3 2.4]]]

EDIT: If you want linear extrapolation, simply do not clip the alpha values:

alpha = ((p_target - a) / (b - a))
jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • is this a method that uses only the two points that are "neighbors" of the point where we want to interpolate to ? I'm assuming this then wouldn't work for extrapolation (I'll add that I need this in my post). For the interpolation part, is this going to be faster than my original method ? – duff18 Sep 02 '20 at 15:25
  • @duff18 Yes, this interpolates each point in the grid using the `T` value above and below. `p_target` values out of the range of `p` are set to the lowest or highest corresponding `T` value, by extrapolating you mean actually using the "slope" with respect to the previous `T` to linearly continue the trend outside of the bounds? For interpolation, this should be faster than looping, although I haven't tested the performance. – jdehesa Sep 02 '20 at 15:31
  • For the extrapolation, yes, I'm guessing that'd be the easiest way of doing things. Thanks, I'll then try to see how this method's speed compares – duff18 Sep 02 '20 at 15:33
  • @duff18 Actually for linear extrapolation you just have to leave the `alpha` values unclipped, I added that to the answer. – jdehesa Sep 02 '20 at 15:36
  • This means that in case the user can not know beforehand whether he needs to interpolate or extrapolate, the code should be able to figure out that itself, right ? – duff18 Sep 03 '20 at 13:20
  • @duff18 Well you can tweak the code to either clip the alpha, not clip it or raise an error if it goes out of [0, 1], depending on what you need. – jdehesa Sep 03 '20 at 15:32
  • then it seems that for both interpolation and extrapolation one can simply use a standard interpolation formula such as `out = v_0 + (v_1 - v_0)/(b - a)*(p_target - a)`, which it's also slightly faster according to my test – duff18 Sep 08 '20 at 10:55
1

I added some parameters for the dimensions; from your choice of n_x = n_y = n_p = 2, the dependencies were not that clear.

from scipy.interpolate import interp1d, interp2d, dfitpack

n_x = 30
n_y = 40
n_p = 50
T = np.random.random((n_p, n_x, n_y)) * 100
p = np.random.random(n_p)
p[np.argmin(p)] = 0
p[np.argmax(p)] = 1
p_target = np.random.random((n_x, n_y))

T_new = np.empty(p_target.shape)

for ix, iy in np.ndindex(p_target.shape):
    f = interp1d(p, T[:, ix, iy])
    T_new[ix, iy] = f(p_target[ix, iy])

Than a word to your modelling. If I understood correctly you want temperature_xy = fun_xy(pressure), a separate function for each coordinate on your spatial grid. Another option might be to include the spatial components in a combined function temperature_xy = fun(pressure, x, y). For the second approach have look at scipy.interpolate.griddata.

You can rearrange the first approach to make it work with interp2d(). For this the first dimension is the pressure x=pressure and the second dimension represents the combined spatial dimensions y=product(x, y). To make this behave as n_x * n_y independent interpolations of the pressure values, I just use the same dummy values 0, 1, 2... for the spatial components both when creating the interpolation and when evaluating it. Because the evaluation of interp2d() normaly only works on grid coordinates, I used the method provided by user6655984 to evaluate the function only on a specific set of points.

def evaluate_interp2d(f, x, y):
    """https://stackoverflow.com/a/47233198/7570817"""
    return dfitpack.bispeu(f.tck[0], f.tck[1], f.tck[2], f.tck[3], f.tck[4], x, y)[0]

f2 = interp2d(x=p, y=np.arange(n_x*n_y), z=T.reshape(n_p, n_x*n_y).T)

T_new2 = evaluate_interp2d(f=f2, x=p_target.ravel(), y=np.arange(n_x*n_y))
T_new2 = T_new2.reshape(n_x, n_y)

print(np.allclose(T_new, T_new2))
# True

With those settings I get a time improvement of almost 10x. But if you use even larger values, like n_x=n_y=1000 the memory usage of this custom interp2d approach grows too large and you iterative approach wins.

# np=50
#    nx*ny      1e2      1e4      1e5      1e6
# interp1d  0.0056s  0.3420s  3.4133s  33.390s
# interp2d  0.0004s  0.0388s  2.0954s  191.66s

With this knowledge you could loop over a big 1000x1000 grid and process 100x100 pieces sequentially, then you would end up at around 3sec instead of 30sec.

def interpolate2d_flat(p, p_target_flat, T_flat):
    n_p, n_xy = T_flat.shape
    f2 = interp2d(x=p, y=np.arange(n_xy), z=T_flat.T)
    return evaluate_interp2d(f=f2, x=p_target_flat, y=np.arange(n_xy))


n_splits = n_x * n_y // 1000  # So each patch has size n_p*1000, can be changed 

# Flatten and split the spatial dimensions
T_flat_s = np.array_split(T.reshape(n_p, n_x*n_y), n_splits, axis=1)
p_target_flat_s = np.array_split(p_target.ravel(), n_splits, axis=0)

# Loop over the patches
T_new_flat = np.concatenate([interpolate2d_flat(p=p, p_target_flat=ptf, T_flat=Tf)
                             for (ptf, Tf) in zip(p_target_flat_s, T_flat_s)])
T_new2 = T_new_flat.reshape(n_x, n_y)
scleronomic
  • 4,392
  • 1
  • 13
  • 43
  • Thanks ! `griddata` doesn't seem to support `fill_value = “extrapolate”` which is something I wanted to have the option to use. As for the `interp2d` approach, it's a big punch in the eye for readability but I'd have used it if it could guarantee a substantial speed-up but unfortunately that doesn't seem to be the case. – duff18 Sep 02 '20 at 07:29
  • It's great you found out that one can save time by using the `interp1d` approach sequentially though. The price to pay is to "unpack" the big array, but it sounds like it's worth it – duff18 Sep 02 '20 at 07:31
  • You mean using the `interp2d` approach iteratively? I updated my answer to include this splitting, I got a speed improvement of almost 20x, so maybe it is worth it for you. Hope I could help. – scleronomic Sep 02 '20 at 10:41
  • Yes, `interp2d`, my bad – duff18 Sep 02 '20 at 15:38