1

I need to perform an interpolation of some Nan values in a 2d numpy array, see for example the following picture:

enter image description here

In my current approach I use scipy.interpolate.griddata for the interpolation procedure. However I noticed that when mirroring the array on both axis i.e. d2 = d[::-1, ::-1] the interpolation gives different results. Here is a complete example :

import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp

def replace_outliers(f):
    mask = np.isnan(f)
    lx, ly = f.shape
    x, y = np.mgrid[0:lx, 0:ly]
    z = interp.griddata(np.array([x[~mask].ravel(),y[~mask].ravel()]).T,
                                          f[~mask].ravel(),
                                          (x,y), method='linear', fill_value=0)
    return z

def main():
    d = np.load('test.npy')
    d2 = d[::-1, ::-1]

    dn = replace_outliers(d)
    dn2 = replace_outliers(d2)

    print np.sum(dn - dn2[::-1, ::-1])

    plt.imshow(dn-dn2[::-1, ::-1], interpolation='nearest')
    plt.colorbar()
    plt.show()


if __name__=='__main__':
    main()

This gives the difference between the two interpolations:

enter image description here

or as evaluated by np.sum its about -62.7

So how can it be that a simple mirroring of the array gives different results in the interpolation procedure? Is there maybe something wrong with the coordinates I use ?

jrsm
  • 1,595
  • 2
  • 18
  • 39

1 Answers1

3

The reason likely is that the linear interpolation is triangle-based. However, such a square grid is a degenerate case for Delaunay triangulation, and the triangulation is not unique. I can imagine the outcome depends on the order of the data points.

For a missing data point, I would guess the two cases correspond to different triangulations of the empty space:

                           A                 A
*   *   *              *---*---*         *---*---*
                       | /   \ |         | / | \ | 
*       *        =>   D*-------*B   or  D*   |   *B
                       | \   / |         | \ | / |
*   *   *              *---*---*         *---*---*
                           C                 C

If you now compute the value at the center, you get (B+D)/2 from one triangulation and (A+C)/2 from the other.

pv.
  • 33,875
  • 8
  • 55
  • 49
  • Thank you for your reply, this could be indeed the case... but still the overall differene between the two cases should be smaller than above? It's of the same order as the field. – jrsm Nov 16 '16 at 13:40
  • By the way if there is a better way to perform this interpolation please let me know :) – jrsm Nov 16 '16 at 13:42
  • If the data is as above, I guess same order as the field is plausible difference between interpolating vertically vs horizontally. – pv. Nov 17 '16 at 19:46
  • Than the question is why not all NaNs, just certain are affected by this problem? Not all interpolations give a different result. The Number of Nans (white) is much larger than the errors in the interpolation. – jrsm Nov 17 '16 at 23:27