Interpolate across sparse grid in pandas

Question

I have a grid of numbers (option volatilities, see picture below), of which there are few entries on the ends of the grid (i.e. it is very sparse). I would like to interpolate\fill in this grid by using the data of the entire grid, i.e. a 2-d interpolation method. I've seen some examples (e.g. here), but I'm not familiar with the scipy and numpy API, and seems like they are doing a bunch of graphing stuff not related to the actual interpolation.

To be clear, I am currently storing this data in a pandas dataframe, with indices OPT_EXPIRE_DT and OPT_STRIKE_PX, and would like to end up at the end with another pandas dataframe, but I can convert to other datatypes as needed.

Thanks for any help!

Hey be careful of filling in the missing prices or implied volatilities. There are a lot of methods of passing money from two parties via the option market so there are some "strange" contracts and prices out there. I would also google Smile and Smirk Options and read about expected IV curves and think about these when interpolating data. — Paul Brennan, Jan 08 '21 at 23:34
Please don't post images of code, data, or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`) ... [Discourage screenshots of code and/or errors](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)...[Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) ... [You should not post code as an image because:...](https://meta.stackoverflow.com/a/285557/2823755) — wwii, Jan 08 '21 at 23:43
What is your question? When you tried some of the methods you found did they produce interpreted values that looke correct? [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — wwii, Jan 08 '21 at 23:44

score 1 · Accepted Answer · answered Jan 09 '21 at 00:36

Here's an example. Let's create some DataFrame with missing values first:

N = 5
df = pd.DataFrame(np.empty((N, N)))
df.iloc[:] = np.nan
df.iloc[:2] = 1
df.iloc[-2:] = 2
df

Output:

     0    1    2    3    4
0  1.0  1.0  1.0  1.0  1.0
1  NaN  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN
4  2.0  2.0  2.0  2.0  2.0

Then we can use griddata to interpolate:

# create meshgrid
x, y = np.mgrid[0:N, 0:N]

# find indices of non-missing values
ix_notna = df.notna().values

# interpolate
z_interpolated = interpolate.griddata(
    (x[ix_notna], y[ix_notna]),
    df.values[ix_notna],
    (x, y),
    method='linear')

# griddata returns numpy array, so we convert it back to DataFrame
df_interpolated = pd.DataFrame(z_interpolated)
df_interpolated

Output:

      0     1     2     3     4
0  1.00  1.00  1.00  1.00  1.00
1  1.25  1.25  1.25  1.25  1.25
2  1.50  1.50  1.50  1.50  1.50
3  1.75  1.75  1.75  1.75  1.75
4  2.00  2.00  2.00  2.00  2.00

And we can visually check that it worked as expected:

fig, ax = plt.subplots(1, 2)
ax[0].imshow(df.values)
ax[0].set_title('original')
ax[1].imshow(df_interpolated.values)
ax[1].set_title('interpolated')

Output:

score 0 · Answer 2 · answered Jan 09 '21 at 00:47

So, I think what you would want to do is convert to a numpy array and back, which should be fairly simple. Here's code that runs through calculating the values for the NaN entries of a simple array with linear interpolation. The output array becomes np.array([[1, 2, 3],[2, 3, 4],[5, 5.5, 6]]).

from scipy.interpolate import interp2d
import numpy as np

# simple 2d array to interpolate
d = np.array([[1, 2, float('NaN')],[float('NaN'), 3, 4], [5, float('NaN'), 6]])

# finds indices where values aren't NaN
valsX, valxY = np.where(np.isnan(d)==False)

# creates interpolation function from values that aren't NaN
interp = interp2d(valsX, valsY, d[valsX,valsY])

# copies original array in case you want to use it
dprime = np.copy(d)

# indices that are NaN we need to change
nanX, nanY = np.where(np.isnan(d))

# runs through all points with interpolation function
for i in range(len(nanX)):
    dprime[nanX[i],nanY[i]] = interp(nanX[i],nanY[i])

Interpolate across sparse grid in pandas

2 Answers2