1

I want to interpolate between different models. To make things easier, my data is shown below:

enter image description here

I have 10 different simulations (which I will call z). For each z I have an array x and an array y (where for a given z, len(x)=len(y)). For example:

for z=1: x.shape=(1200,) and y.shape=(1200,)

for z=2: x.shape=(1250,) and y.shape=(1250,)

for z=3: x.shape=(1236,) and y.shape=(1236,)

and so on ...

I want to interpolate so that for a given z and x, I get y. For example, for z=2.5 and x=10**9, the code outputs y. I am assuming that:

y = a*x + b*z + c where of course I don't know a, b, and c.

My question is that how do I store the data in a grid? I am confused since for a different z the size of x and y differs. How is it possible to build a grid?

UPDATE

I was able to partially solve my problem. What I did first is that I interpolated between x and y using interp1d. It worked perfectly fine. I then created a new grid of x and y values. Briefly the method is:

f = interp1d(x, y, kind='linear')
new_x = np.linspace(10**7, 4*10**9, 10000)
new_y = f(new_x)

I then interpolated x, y, and z:

ff = LinearNDInterpolator( (x, z), y)

To test whether the method work, here's a plot with z=3.

enter image description here

The plot looks good till x=10**8. Indeed, the line deviates from the original model. Here's a plot when I further zoom in:

enter image description here

The interpolation obviously is not good when x > 10**8. How can I fix it?

aloha
  • 4,554
  • 6
  • 32
  • 40
  • Are there standard spacings between x? Is the grid necessary, or just the easy lookup? If just easy lookup you can put everything in a `dict` of `dict` as in `my_y = ys[my_z][my_x]` – mitoRibo Sep 26 '16 at 22:12
  • [This](http://stackoverflow.com/questions/33391495/iterated-interpolation-first-interpolate-grids-then-interpolate-value/33451160#33451160) might be similar. – Andras Deak -- Слава Україні Sep 26 '16 at 22:14
  • @rbierman, the grid is necessary, as I want to interpolate in between the different models. There is no standard spacing between the `x`, they're random. – aloha Sep 26 '16 at 22:21
  • @aloha ok this sounds like a hard problem. Can you explain why you set `z=2.5` above, I thought `z` was the simulation number (an int)? Does interpolate between models mean averaging the models together? If they don't share x's, you'll have to do some sort of interpolation within each model before comparing model to model. – mitoRibo Sep 26 '16 at 22:38
  • @rbierman, I set `z=2.5` as an example, I could have said `z=2.3546372`. It is a totally random number. The models do not have the share the same `x`'s. I want to try to interpolate between `x` and `y` first and then include `z`. Andra's link is similar to what I am looking for. – aloha Sep 26 '16 at 22:50
  • @AndrasDeak, the link is very similar to what I need. However in your answer/example, you have that `x` and `y` have the same shape. In my case, for different values of `z`, `x` and `y` don't have the same shape. I mentioned this in my question. How can I solve this? – aloha Sep 27 '16 at 10:32
  • You could probably use something like `scipy.interpolate.griddata` to construct a single regular grid based on your series of grids. – Andras Deak -- Слава Україні Sep 27 '16 at 10:36
  • I already checked `scipy.interpolate.griddata`. The problem is that I need to input the points at which I want to interpolate. If I want to do this 10,000 times each one at a time, then I build my grid 10,000 times which is too slow. I am looking for a more efficient method. @AndrasDeak – aloha Sep 27 '16 at 10:38
  • There are [other 2d interpolation options](http://stackoverflow.com/q/37872171/5067311), but the spline-based ones are not very reliable in my experience. However, you can use [the underlying interpolator that griddata uses](http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata), and call that multiple times after constructing it once. For linear case this is `LinearNDInterpolator`, for cubic it's `CloughTocher2DInterpolator`. – Andras Deak -- Слава Україні Sep 27 '16 at 10:50
  • @AndrasDeak, I updated my question. Can you please check it out. – aloha Oct 05 '16 at 13:19
  • @aloha I added an answer to show how I think this should/could be done, please let me know how it works out for you. – Andras Deak -- Слава Україні Oct 05 '16 at 19:28
  • @aloha have you had the time to check out my answer? – Andras Deak -- Слава Україні Oct 15 '16 at 19:47
  • Hey @AndrasDeak. Sorry for replying so late, I was able to fix the problem by using `scipy.interpolate.RegularGridInterpolator`. I hope this helps ... It is super fast too – aloha Nov 15 '16 at 11:08
  • "*I hope this helps*", well, the point is that it helps you;) If you have the time, you should add an answer of your own, and accept it (after 2 days when the system will let you), so that the question no longer looks as one that needs an answer. – Andras Deak -- Слава Україні Nov 15 '16 at 11:25

2 Answers2

1

What you're doing seems a bit weird to me, at least you seem to use a single set of y values to do the interpolation. What I suggest is not performing two interpolations one after the other, but considering your y(z,x) function as the result of a pure 2d interpolation problem.

So as I noted in a comment, I suggest using scipy.interpolate.LinearNDInterpolator, the same object that griddata uses under the hood for bilinear interpolatin. As we've also discussed in comments, you need to have a single interpolator that you can query multiple times afterwards, so we have to use the lower-level interpolator object, as that is callable.

Here's a full example of what I mean, complete with dummy data and plotting:

import numpy as np
import scipy.interpolate as interp
import matplotlib.pyplot as plt

# create dummy data
zlist = range(4)  # z values
# one pair of arrays for each z value in a list:
xlist = [np.linspace(-1,1,41),
         np.linspace(-1,1,61),
         np.linspace(-1,1,55),
         np.linspace(-1,1,51)]
funlist = [lambda x:0.1*np.ones_like(x),
           lambda x:0.2*np.cos(np.pi*x)+0.4,
           lambda x:np.exp(-2*x**2)+0.5,
           lambda x:-0.7*np.abs(x)+1.7]
ylist = [f(x) for f,x in zip(funlist,xlist)]

# create contiguous 1d arrays for interpolation
all_x = np.concatenate(xlist)
all_y = np.concatenate(ylist)
all_z = np.concatenate([np.ones_like(x)*z for x,z in zip(xlist,zlist)])

# create a single linear interpolator object
yfun = interp.LinearNDInterpolator((all_z,all_x),all_y)

# generate three interpolated sets: one with z=2 to reproduce existing data,
# two with z=1.5 and z=2.5 respectively to see what happens
xplot = np.linspace(-1,1,30)
z = 2
y_repro = yfun(z,xplot)
z = 1.5
y_interp1 = yfun(z,xplot)
z = 2.5
y_interp2 = yfun(z,xplot)

# plot the raw data (markers) and the two interpolators (lines)
fig,ax = plt.subplots()
for x,y,z,mark in zip(xlist,ylist,zlist,['s','o','v','<','^','*']):
    ax.plot(x,y,'--',marker=mark,label='z={}'.format(z))
ax.plot(xplot,y_repro,'-',label='z=2 interp')
ax.plot(xplot,y_interp1,'-',label='z=1.5 interp')
ax.plot(xplot,y_interp2,'-',label='z=2.5 interp')
ax.set_xlabel('x')
ax.set_ylabel('y')
# reduce plot size and put legend outside for prettiness, see also http://stackoverflow.com/a/4701285/5067311
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

plt.show()

result

You didn't specify how you series of (x,y) array pairs are stored, I used a list of numpy ndarrays. As you see, I flattened the list of 1d arrays into a single set of 1d arrays: all_x, all_y, all_z. These can be used as scattered y(z,x) data from which you can construct the interpolator object. As you can see in the result, for z=2 it reproduces the input points, and for non-integer z it interpolates between the relevant y(x) curves.

This method should be applicable to your dataset. One note, however: you have huge numbers on a logarithmic scale on your x axis. This alone could lead to numeric instabilities. I suggest that you also try performing the interpolation using log(x), it might behave better (this is just a vague guess).

0

It seems that in your problem the curves y(x) are well behaving, so you could probably just interpolate y(x) for the given values of z first and then interpolate between the obtained y-values afterwards.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider

import random

#####
# Generate some data
#####
generate = lambda x, z: 1./(x+1.)+(z*x/75.+z/25.)

def f(z):
    #create an array of values between zero and 100 of random length
    x = np.linspace(0,10., num=random.randint(42,145))
    #generate corresponding y values
    y = generate(x, z)
    return np.array([x,y])

Z = [1, 2, 3, 3.6476, 4, 5.1]
A = [f(z) for z in Z]
#now A contains the dataset of [x,y] pairs for each z value

#####
# Interpolation
#####
def do_interpolation(x,z):
    #assume Z being sorted in ascending order
    #look for indizes of z values closest to given z
    ig = np.searchsorted(Z, z)
    il = ig-1
    #interpolate y(x) for those z values
    yg = np.interp(x, A[ig][0,:], A[ig][1,:])
    yl = np.interp(x, A[il][0,:], A[il][1,:])
    #linearly interpolate between yg and yl  
    return yl + (yg-yl)*float(z-Z[il])/(Z[ig] - Z[il])  

# do_interpolation(x,z) will now provide the interpolated data
print do_interpolation( np.linspace(0, 10), 2.5) 

#####
# Plotting, use Slider to change the value of z. 
#####
fig=plt.figure()
fig.subplots_adjust(bottom=0.2)
ax=fig.add_subplot(111)
for i in range(len(Z)):
    ax.plot(A[i][0,:] , A[i][1,:], label="{z}".format(z=Z[i]) )

l, = ax.plot(np.linspace(0, 10) , do_interpolation( np.linspace(0, 10), 2.5), label="{z}".format(z="interpol"), linewidth=2., color="k" )

axn1 = plt.axes([0.25, 0.1, 0.65, 0.03], axisbg='#e4e4e4')
sn1 = Slider(axn1, 'z', Z[0], Z[-1], valinit=2.5)
def update(val):
    l.set_data(np.linspace(0, 10), do_interpolation( np.linspace(0, 10), val))
    plt.draw()
sn1.on_changed(update)

ax.legend()
plt.show()
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712