2

I am converting some code from Matlab to Python and found that I was getting different results from scipy.interpolate.griddata than from Matlab scatteredInterpolant. After much research and experimentation I found that the interplation results from scipy.interpolate.griddata seem to depend on the size of the data set provided. There seem to be thresholds that cause the interpolated value to change. Is this a bug? OR Can someone explain the algorithm used that would cause this. Here is code that demonstrates the problem.

import numpy as np
from scipy import interpolate

# This code provides a simple example showing that the interpolated value 
# for the same location changes depending on the size of the input data set.

# Results of this example show that the interpolated value changes 
# at repeat 10 and 300.

def compute_missing_value(data):
    """Compute the missing value example function."""

    # Indices for valid x, y, and z data
    # In this example x and y are simply the column and row indices
    valid_rows, valid_cols = np.where(np.isnan(data) == False)
    valid_data = data[np.isnan(data) == False]

    interpolated_value = interpolate.griddata(np.array((valid_rows, 
                   valid_cols)).T, valid_data, (2, 2), method='linear')


    print('Size=', data.shape,'  Value:', interpolated_value)


# Sample data
data = np.array([[0.2154, 0.1456, 0.1058, 0.1918],
                 [-0.0398, 0.2238, -0.0576, 0.3841],
                 [0.2485, 0.2644, 0.2639, 0.1345],
                 [0.2161, 0.1913, 0.2036, 0.1462],
                 [0.0540, 0.3310, 0.3674, 0.2862]])

# Larger data sets are created by tiling the original data.
# The location of the invalid data to be interpolated is maintained at 2,2
repeat_list =[1, 9, 10, 11, 30, 100, 300]
for repeat in repeat_list:
    new_data = np.tile(data, (1, repeat))
    new_data[2,2] = np.nan
    compute_missing_value(new_data)

The results are:

Size= (5, 4)   Value: 0.07300000000000001  
Size= (5, 36)   Value: 0.07300000000000001  
Size= (5, 40)   Value: 0.19945000000000002  
Size= (5, 44)   Value: 0.07300000000000001  
Size= (5, 120)   Value: 0.07300000000000001  
Size= (5, 400)   Value: 0.07300000000000001  
Size= (5, 1200)   Value: 0.19945000000000002
Gsk
  • 2,929
  • 5
  • 22
  • 29
dmueller
  • 41
  • 7

2 Answers2

3

I think the explanation may lie in the way that scipy.interpolate.griddata constructs a triangularization of your data before interpolating. From the documentation, this uses scipy.interpolate.LinearNDInterpolator, which looks like it constructs a Delaunay triangularization of your data, which isn't guaranteed to be the same when you add more nodes at the edge of your grid (as you've done with numpy.tile). Because of the way your 2D area is divided into triangles, the resulting linear interpolation may vary.

For a plain 4x5 grid, with the (2,2) element missing, the Delaunay triangularization produced by scipy.spatial.Delaunay looks like this: enter image description here

If you then tile the grid data, by the time you have four copies of the grid, the Delaunay triangularization has changed around the (2,2) location, which now lies on a horizontal boundary rather than vertical:

enter image description here

This means that the resulting interpolation for the value at (2,2) will use a different set of neighbouring nodes, which will give a different interpolated value on this extended grid. (From a few quick experiments, this effect may not be present for 2x, or 3x tiling, but showed up on the 4x tiling.) This change in the layout of the triangles is due to the way the Delaunay triangularization is computed, which involves projecting the entire 2D grid into a 3D space, and then computing the convex hull before projecting that back into 2D triangles. That means that as you add more nodes to the grid, there's no guarantee that the 3D convex hull will be the identical even where it refers to the same nodes in the original 2D grid.

rwp
  • 1,786
  • 2
  • 17
  • 28
  • Ok. This explains how the two numbers are computed but why should the orientation of the triangle change with size of the array? I would be willing to accept either solution as valid, but it doesn't seem like the size of the data set should randomly change the triangulation and the resulting interpolation. I would expect consistent results. – dmueller Mar 25 '18 at 16:19
  • I've edited my answer to include a bit more explanation. Hopefully this goes some way to answering the question - perhaps further reading on the Delaunay algorithm would give you the complete picture. – rwp Mar 25 '18 at 17:24
  • Thanks. I guess I need to understand more about Delaunay triangulation. However, when I used matplotlib.tri.Triangulation and matplotlib.tri.LinearTriInterpolator I consistently got 0.0730. I may move to rbf interpolation once comparison with Matlab is complete. – dmueller Mar 25 '18 at 19:19
3

Jaime's answer describes how scipy.interpolate.griddata interpolates values using Delaunay triangulation:

[When] you make a call to scipy.interpolate.griddata:

  1. First, a call to sp.spatial.qhull.Delaunay is made to triangulate the irregular grid coordinates.
  2. Then, for each point in the new grid, the triangulation is searched to find in which triangle ... does it lay.
  3. The barycentric coordinates of each new grid point with respect to the vertices of the enclosing simplex are computed.
  4. An interpolated values is computed for that grid point, using the barycentric coordinates, and the values of the function at the vertices of the enclosing simplex.

pv. explains that Delaunay triangulation generated by a square grid is not unique. Since the points that get linearly interpolated depend on the triangulation, you can get different results depending on the particular Delaunay triangulation generated.

Here is a modified version of your script which draws the Delaunay triagulation used:

import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
import scipy.spatial as spatial
import matplotlib.collections as mcoll

def compute_missing_value(data):
    """Compute the missing value example function."""

    mask = ~np.isnan(data)
    valid_rows, valid_cols = np.where(mask)
    valid_data = data[mask]
    interpolated_value = interpolate.griddata(
        (valid_cols, valid_rows), valid_data, (2, 2), method='linear')

    print('Size: {:<12s} Value: {:<.4f}'.format(
        str(data.shape), interpolated_value))

    points = np.column_stack((valid_cols, valid_rows))

    tess = spatial.Delaunay(points)
    tri = tess.simplices 
    verts = tess.points[tri]
    lc = mcoll.LineCollection(
        verts, colors='black', linewidth=2, zorder=5)
    fig, ax = plt.subplots(figsize=(6, 6))
    ax.add_collection(lc)
    
    ax.plot(valid_cols, valid_rows, 'ko')
    ax.set(xlim=(0, 3), ylim=(0, 3))
    plt.title('Size: {:<12s} Value: {:<.4f}'.format(
        str(data.shape), interpolated_value))

    for label, x, y in zip(valid_data, valid_cols, valid_rows):
        plt.annotate(
            label,
            xy=(x, y), xycoords='data',
            xytext = (-20, -40), textcoords = 'offset points',
            horizontalalignment = 'center',
            verticalalignment = 'bottom',
            bbox = dict(
                boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
            arrowprops = dict(arrowstyle='->', connectionstyle='arc3,rad=0'))

    plt.show()


# Sample data
orig_data = np.array([[0.2154, 0.1456, 0.1058, 0.1918],
                 [-0.0398, 0.2238, -0.0576, 0.3841],
                 [0.2485, 0.2644, 0.2639, 0.1345],
                 [0.2161, 0.1913, 0.2036, 0.1462],
                 [0.0540, 0.3310, 0.3674, 0.2862]])

repeat_list =[1, 4]
for repeat in repeat_list:
    print('{}: '.format(repeat), end='')
    new_data = np.tile(orig_data, (1, repeat))
    new_data[2,2] = np.nan
    compute_missing_value(new_data)

enter image description here

enter image description here

As you can see, the two interpolated values, 0.1995 and 0.073, are the average of (A,C) or (B,D) (using pv.'s notation):

In [159]: (0.2644+0.1345)/2
Out[159]: 0.19945000000000002

In [160]: (0.2036-0.0576)/2
Out[160]: 0.07300000000000001
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677