18

I have a 2d array(or matrix if you prefer) with some missing values represented as NaN. The missing values are typically in a strip along one axis, eg:

1   2   3 NaN   5
2   3   4 Nan   6
3   4 Nan Nan   7
4   5 Nan Nan   8
5   6   7   8   9

where I would like to replace the NaN's by somewhat sensible numbers.

I looked into delaunay triangulation, but found very little documentation.

I tried using astropy's convolve as it supports use of 2d arrays, and is quite straightforward. The problem with this is that convolution is not interpolation, it moves all values towards the average (which could be mitigated by using a narrow kernel).

This question should be the natural 2-dimensional extension to this post. Is there a way to interpolate over NaN/missing values in a 2d-array?

Community
  • 1
  • 1
M.T
  • 4,917
  • 4
  • 33
  • 52
  • There are many ways you could interpolate this. One difficulty is that your data is no longer rectangular, and many simple 2d interpolation algorithms require this, but it is still possible. Do you have any particular requirements for the interpolation? – Jeremy West Jun 06 '16 at 16:17
  • For example, this http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.interpolate.interp2d.html probably does what you need. Just pass in the data points that aren't NaN and then resample at the NaN ones after constructing the interpolation. – Jeremy West Jun 06 '16 at 16:26
  • 1
    Also, this question: http://stackoverflow.com/questions/5146025/python-scipy-2d-interpolation-non-uniform-data seems to be essentially the same. – Jeremy West Jun 06 '16 at 16:28
  • @JeremyWest Thank you very much for the links, I think [`griddata`](http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata) is close to what I was looking for. – M.T Jun 06 '16 at 18:37
  • 1
    A better approach than the answers below is using *inpainting* techniques. See for example: https://docs.opencv.org/3.4/df/d3d/tutorial_py_inpainting.html – Cris Luengo Jul 27 '23 at 16:26

3 Answers3

30

Yes you can use scipy.interpolate.griddata and masked array and you can choose the type of interpolation that you prefer using the argument method usually 'cubic' do an excellent job:

import numpy as np
from scipy import interpolate


#Let's create some random  data
array = np.random.random_integers(0,10,(10,10)).astype(float)
#values grater then 7 goes to np.nan
array[array>7] = np.nan

That looks something like this using plt.imshow(array,interpolation='nearest') :

enter image description here

x = np.arange(0, array.shape[1])
y = np.arange(0, array.shape[0])
#mask invalid values
array = np.ma.masked_invalid(array)
xx, yy = np.meshgrid(x, y)
#get only the valid values
x1 = xx[~array.mask]
y1 = yy[~array.mask]
newarr = array[~array.mask]

GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                          (xx, yy),
                             method='cubic')

This is the final result:

enter image description here

Look that if the nan values are in the edges and are surrounded by nan values thay can't be interpolated and are kept nan. You can change it using the fill_value argument.

How would this work if there is a 3x3 region of NaN-values, would you get sensible data for the middle point?

It depends on your kind of data, you have to perform some test. You could for instance mask on purpose some good data try different kind of interpolation e.g. cubic, linear etc. etc. with the array with the masked values and calculuate the difference between the values interpolated and the original values that you had masked before and see which method return you the minor difference.

You can use something like this:

reference = array[3:6,3:6].copy()
array[3:6,3:6] = np.nan
method = ['linear', 'nearest', 'cubic']

for i in method:
    GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                              (xx, yy),
                                 method=i)
    meandifference = np.mean(np.abs(reference - GD1[3:6,3:6]))
    print ' %s interpolation difference: %s' %(i,meandifference )

That gives something like this:

   linear interpolation difference: 4.88888888889
   nearest interpolation difference: 4.11111111111
   cubic interpolation difference: 5.99400137377

Of course this is for random numbers so it's normal that the result may vary a lot. So the best thing to do is to test on "on purpose masked" piece of your dataset and see what happen.

Community
  • 1
  • 1
G M
  • 20,759
  • 10
  • 81
  • 84
  • 1
    How would this work if there is a 3x3 region of NaN-values, would you get sensible data for the middle point? – M.T Sep 20 '16 at 18:00
  • @M.T Hi, I have edited the answer, to answer this question. – G M Sep 21 '16 at 08:22
6

For your convenience, here is a function implementing G M's answer.

from scipy import interpolate
import numpy as np

def interpolate_missing_pixels(
        image: np.ndarray,
        mask: np.ndarray,
        method: str = 'nearest',
        fill_value: int = 0
):
    """
    :param image: a 2D image
    :param mask: a 2D boolean image, True indicates missing values
    :param method: interpolation method, one of
        'nearest', 'linear', 'cubic'.
    :param fill_value: which value to use for filling up data outside the
        convex hull of known pixel values.
        Default is 0, Has no effect for 'nearest'.
    :return: the image with missing values interpolated
    """
    from scipy import interpolate

    h, w = image.shape[:2]
    xx, yy = np.meshgrid(np.arange(w), np.arange(h))

    known_x = xx[~mask]
    known_y = yy[~mask]
    known_v = image[~mask]
    missing_x = xx[mask]
    missing_y = yy[mask]

    interp_values = interpolate.griddata(
        (known_x, known_y), known_v, (missing_x, missing_y),
        method=method, fill_value=fill_value
    )

    interp_image = image.copy()
    interp_image[missing_y, missing_x] = interp_values

    return interp_image
Sam De Meyer
  • 2,031
  • 1
  • 25
  • 32
-3

I'd actually manually go through this matrix row by row, and whenever you start encountering a list of Nans, keep track of the number immediately before the Nans and immediately after, and the number of Nans you saw before going back to ordinary numbers. Once those numbers are found, it would be possible to overwrite Nans with interpolated values yourself.

Everyone_Else
  • 3,206
  • 4
  • 32
  • 55