want to smooth a contour from a masked array

Question

I have a masked array which is used by matplotlib.plt.contourf to project a temperature contour on a glabal map. I was trying to smooth the contour, but unfortunately none of the proposed solutions seems to be able to handle masked array. I tested these solutions:

-scipy.ndimage.gaussian_filter - moving averages

scipy.ndimage.zoom

none of them works(they count in the masked values also). Is there any way I can smooth my contour on maskedArray

I have added this part after trying the proposed 'inpaint' solution and the results were unchanged. here is the code (if it helps)

import Scientific.IO.NetCDF as S
import mpl_toolkits.basemap as bm
import numpy.ma as MA
import numpy as np
import matplotlib.pyplot as plt
import inpaint

def main():

    fileobj = S.NetCDFFile('Bias.ANN.tas_A1_1.nc', mode='r')

    # take the values
    set1 = {'time', 'lat', 'lon'}
    set2 = set(fileobj.variables.keys())
    set3 = set2 - set1
    datadim = set3.pop()
    print "******************datadim: "+datadim
    data = fileobj.variables[datadim].getValue()[0,:,:]


    lon = fileobj.variables['lon'].getValue()
    lat = fileobj.variables['lat'].getValue()
    fileobj.close()


    data, lon = bm.shiftgrid(180.,data, lon,start=False)
    data = MA.masked_equal(data, 1.0e20)
    #data2 = inpaint.replace_nans(data, 10, 0.25, 2, 'idw')
    #- Make 2-D longitude and latitude arrays:

    [lon2d, lat2d] =np.meshgrid(lon, lat)


    #- Set up map:

    mapproj = bm.Basemap(projection='cyl', 
                       llcrnrlat=-90.0, llcrnrlon=-180.00,
                       urcrnrlat=90.0, urcrnrlon=180.0)
    mapproj.drawcoastlines(linewidth=.5)
    mapproj.drawmapboundary(fill_color='.8')
    #mapproj.drawparallels(N.array([-90, -45, 0, 45, 90]), labels=[1,0,0,0])
    #mapproj.drawmeridians(N.array([0, 90, 180, 270, 360]), labels=[0,0,0,1])
    lonall, latall = mapproj(lon2d, lat2d)

    cmap=plt.cm.Spectral
    #- Make a contour plot of the temperature:
    mymapf = plt.contourf(lonall, latall, data, 20, cmap=cmap)
    #plt.clabel(mymapf, fontsize=12)
    plt.title(cmap.name)
    plt.colorbar(mymapf, orientation='horizontal')

    plt.savefig('sample2.png', dpi=150, edgecolor='red', format='png', bbox_inches='tight', pad_inches=.2)
    plt.close()
if __name__ == "__main__":
  main()

I am comparing the output from this code (the first figure), with output of the same datafile from Panoply. Zoomin in and looking more precisely it seems like it is not the smoothness problem, but the pyplot model provides one stripe slimmer, or the contours are cut earlier (the outer boundaries shows this clearly, and inner contours are different due to this fact). It makes it to look like that the pyplot model is not as smooth as the Panoply one. how can I get (nearly) the same model? Am I distinguishing it right?

the outcome from pyplo

the outcome from PanoPly

Are you sure that you're plotting the same data in both? The underlying data appears to be at a different resolution in the second plot. — Joe Kington, Jun 17 '13 at 13:45
Have a look at [this post](http://stackoverflow.com/questions/5551286/filling-gaps-in-a-numpy-array), too. — j08lue, Jan 31 '14 at 14:21

score 4 · Answer 1 · answered Jun 15 '13 at 15:23

I had similar problem and google pointed me to this: blog post. Basically he's using inpaint algorithm to interpolate missing values and produce valid array for filtering.

The code is at the end of the post, and you can save it to site-packages (or else) and load it as module (i.e. inpaint.py):

import inpaint

filled = inpaint.replace_nans(NANMask, 5, 0.5, 2, 'idw')

I'm happy with the result, and I guess it will suite missing temperature values just fine. There is also next version here: github but code will need some cleaning for general usage as it's part of a project.

For reference, easy use and preservation sake I'll post the code (of initial version) here:

# -*- coding: utf-8 -*-

"""A module for various utilities and helper functions"""

import numpy as np
#cimport numpy as np
#cimport cython

DTYPEf = np.float64
#ctypedef np.float64_t DTYPEf_t

DTYPEi = np.int32
#ctypedef np.int32_t DTYPEi_t

#@cython.boundscheck(False) # turn of bounds-checking for entire function
#@cython.wraparound(False) # turn of bounds-checking for entire function
def replace_nans(array, max_iter, tol,kernel_size=1,method='localmean'):
    """Replace NaN elements in an array using an iterative image inpainting algorithm.
The algorithm is the following:
1) For each element in the input array, replace it by a weighted average
of the neighbouring elements which are not NaN themselves. The weights depends
of the method type. If ``method=localmean`` weight are equal to 1/( (2*kernel_size+1)**2 -1 )
2) Several iterations are needed if there are adjacent NaN elements.
If this is the case, information is "spread" from the edges of the missing
regions iteratively, until the variation is below a certain threshold.
Parameters
----------
array : 2d np.ndarray
an array containing NaN elements that have to be replaced
max_iter : int
the number of iterations
kernel_size : int
the size of the kernel, default is 1
method : str
the method used to replace invalid values. Valid options are
`localmean`, 'idw'.
Returns
-------
filled : 2d np.ndarray
a copy of the input array, where NaN elements have been replaced.
"""

#    cdef int i, j, I, J, it, n, k, l
#    cdef int n_invalids

    filled = np.empty( [array.shape[0], array.shape[1]], dtype=DTYPEf)
    kernel = np.empty( (2*kernel_size+1, 2*kernel_size+1), dtype=DTYPEf )

#    cdef np.ndarray[np.int_t, ndim=1] inans
#    cdef np.ndarray[np.int_t, ndim=1] jnans

    # indices where array is NaN
    inans, jnans = np.nonzero( np.isnan(array) )

    # number of NaN elements
    n_nans = len(inans)

    # arrays which contain replaced values to check for convergence
    replaced_new = np.zeros( n_nans, dtype=DTYPEf)
    replaced_old = np.zeros( n_nans, dtype=DTYPEf)

    # depending on kernel type, fill kernel array
    if method == 'localmean':

        print 'kernel_size', kernel_size       
        for i in range(2*kernel_size+1):
            for j in range(2*kernel_size+1):
                kernel[i,j] = 1
        print kernel, 'kernel'

    elif method == 'idw':
        kernel = np.array([[0, 0.5, 0.5, 0.5,0],
                  [0.5,0.75,0.75,0.75,0.5], 
                  [0.5,0.75,1,0.75,0.5],
                  [0.5,0.75,0.75,0.5,1],
                  [0, 0.5, 0.5 ,0.5 ,0]])
        print kernel, 'kernel'      
    else:
        raise ValueError( 'method not valid. Should be one of `localmean`.')

    # fill new array with input elements
    for i in range(array.shape[0]):
        for j in range(array.shape[1]):
            filled[i,j] = array[i,j]

    # make several passes
    # until we reach convergence
    for it in range(max_iter):
        print 'iteration', it
        # for each NaN element
        for k in range(n_nans):
            i = inans[k]
            j = jnans[k]

            # initialize to zero
            filled[i,j] = 0.0
            n = 0

            # loop over the kernel
            for I in range(2*kernel_size+1):
                for J in range(2*kernel_size+1):

                    # if we are not out of the boundaries
                    if i+I-kernel_size < array.shape[0] and i+I-kernel_size >= 0:
                        if j+J-kernel_size < array.shape[1] and j+J-kernel_size >= 0:

                            # if the neighbour element is not NaN itself.
                            if filled[i+I-kernel_size, j+J-kernel_size] == filled[i+I-kernel_size, j+J-kernel_size] :

                                # do not sum itself
                                if I-kernel_size != 0 and J-kernel_size != 0:

                                    # convolve kernel with original array
                                    filled[i,j] = filled[i,j] + filled[i+I-kernel_size, j+J-kernel_size]*kernel[I, J]
                                    n = n + 1*kernel[I,J]

            # divide value by effective number of added elements
            if n != 0:
                filled[i,j] = filled[i,j] / n
                replaced_new[k] = filled[i,j]
            else:
                filled[i,j] = np.nan

        # check if mean square difference between values of replaced
        #elements is below a certain tolerance
        print 'tolerance', np.mean( (replaced_new-replaced_old)**2 )
        if np.mean( (replaced_new-replaced_old)**2 ) < tol:
            break
        else:
            for l in range(n_nans):
                replaced_old[l] = replaced_new[l]

    return filled


def sincinterp(image, x,  y, kernel_size=3 ):
    """Re-sample an image at intermediate positions between pixels.
This function uses a cardinal interpolation formula which limits
the loss of information in the resampling process. It uses a limited
number of neighbouring pixels.
The new image :math:`im^+` at fractional locations :math:`x` and :math:`y` is computed as:
.. math::
im^+(x,y) = \sum_{i=-\mathtt{kernel\_size}}^{i=\mathtt{kernel\_size}} \sum_{j=-\mathtt{kernel\_size}}^{j=\mathtt{kernel\_size}} \mathtt{image}(i,j) sin[\pi(i-\mathtt{x})] sin[\pi(j-\mathtt{y})] / \pi(i-\mathtt{x}) / \pi(j-\mathtt{y})
Parameters
----------
image : np.ndarray, dtype np.int32
the image array.
x : two dimensions np.ndarray of floats
an array containing fractional pixel row
positions at which to interpolate the image
y : two dimensions np.ndarray of floats
an array containing fractional pixel column
positions at which to interpolate the image
kernel_size : int
interpolation is performed over a ``(2*kernel_size+1)*(2*kernel_size+1)``
submatrix in the neighbourhood of each interpolation point.
Returns
-------
im : np.ndarray, dtype np.float64
the interpolated value of ``image`` at the points specified
by ``x`` and ``y``
"""

    # indices
#    cdef int i, j, I, J

    # the output array
    r = np.zeros( [x.shape[0], x.shape[1]], dtype=DTYPEf)

    # fast pi
    pi = 3.1419

    # for each point of the output array
    for I in range(x.shape[0]):
        for J in range(x.shape[1]):

            #loop over all neighbouring grid points
            for i in range( int(x[I,J])-kernel_size, int(x[I,J])+kernel_size+1 ):
                for j in range( int(y[I,J])-kernel_size, int(y[I,J])+kernel_size+1 ):
                    # check that we are in the boundaries
                    if i >= 0 and i <= image.shape[0] and j >= 0 and j <= image.shape[1]:
                        if (i-x[I,J]) == 0.0 and (j-y[I,J]) == 0.0:
                            r[I,J] = r[I,J] + image[i,j]
                        elif (i-x[I,J]) == 0.0:
                            r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(j-y[I,J]) )/( pi*(j-y[I,J]) )
                        elif (j-y[I,J]) == 0.0:
                            r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )/( pi*(i-x[I,J]) )
                        else:
                            r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )*np.sin( pi*(j-y[I,J]) )/( pi*pi*(i-x[I,J])*(j-y[I,J]))
    return r


#cdef extern from "math.h":
 #   double sin(double)

It's worth mentioning that the code you pasted in is meant to be Cython. If you run it as python code, it's a very inefficient implementation. (It will still work fine, of course, it will just be slow.) You could implement the same algorithm with `scipy.ndimage`, if you wanted to avoid the inner loops being in python. — Joe Kington, Jun 15 '13 at 15:34
Good point. Advanced user will know what to do. As for the speed, code runs just fine, I'd expect zooming the array to be slower than this inpainting interpolation. — theta, Jun 15 '13 at 15:40
For what it's worth, zooming is only slower if there's only one `NaN`. For most cases (10-20% NaN's) zooming by a factor of 2 is 10-20x faster than inpainting, at least on my system. (The speed of the inpaint function depends strongly on the number of `NaN`s, while the speed of zooming is constant for a given array size and zoom level.) Again, that's with that particular implementation of inpainting. Implementing the same algorithm using `ndimage` should speed things up significantly. — Joe Kington, Jun 15 '13 at 15:51
theta and @JoeKington thank you so much for your help. Unfortunately it did not work for me. I will edit my first post adding the code, also two photoes first from my code and second one the output from PanoPly for the same data set. your ideas would be greatly appreciated. Thank you again — BobbyF, Jun 16 '13 at 18:16
`inpaint` function should replace NAN values to interpolated values so scipy filters won't complain when feeding data. If I understand your problem right, you are now having problem with boundaries. If that's the case maybe you should try new version (github link in my answer) as I never used this algorithm on data with large holes, and such issues are reported as corrected by it's author here: [blogpost](http://astrolitterbox.blogspot.de/2012/08/the-new-inpainting-script.html) — theta, Jun 16 '13 at 19:29

tom10 · Answer 2 · 2013-06-18T21:04:49.633

A simple smoothing function that works with masked data will solve this. One can then avoid the approaches that involve making up data (ie, interpolating, inpainting, etc); and making up data should always be avoided.

The main issue that arises when smoothing masked data is that for each point, smoothing uses the neighboring values to calculate a new value at a center point, but when those neighbors are masked, the new value for the center point will also become masked due to the rules of masked arrays. Therefore, one needs to do the calculation with unmasked data, and explicitly account for the mask. That's easy to do, and is not in the function smooth below.

from numpy import *
import pylab as plt

#  make a grid and a striped mask as test data
N = 100
x = linspace(0, 5, N, endpoint=True)
grid = 2. + 1.*(sin(2*pi*x)[:,newaxis]*sin(2*pi*x)>0.)
m = resize((sin(pi*x)>0), (N,N))

plt.imshow(grid.copy(), cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('original data')


def smooth(u, mask):
    m = ~mask
    r = u*m  # set all 'masked' points to 0. so they aren't used in the smoothing
    a = 4*r[1:-1,1:-1] + r[2:,1:-1] + r[:-2,1:-1] + r[1:-1,2:] + r[1:-1,:-2]
    b = 4*m[1:-1,1:-1] + m[2:,1:-1] + m[:-2,1:-1] + m[1:-1,2:] + m[1:-1,:-2]  # a divisor that accounts for masked points
    b[b==0] = 1.  # for avoiding divide by 0 error (region is masked so value doesn't matter)
    u[1:-1,1:-1] = a/b

# run the data through the smoothing filter a few times
for i in range(10):   
    smooth(grid, m)

mg = ma.array(grid, mask=m)  # put together the mask and the data

plt.figure()
plt.imshow(mg, cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('smoothed with mask')

plt.show()

enter image description here

The main point is that at the boundary of the mask, the masked values are not used in the smoothing. (This is also where the grid squares switch values, so it would be clear in the figure if the masked neighboring values were being used.)

score 0 · Answer 3 · answered Apr 01 '21 at 15:15

We also just had this problem and the astropy package has us covered:

import numpy as np
import matplotlib.pyplot as plt

# Some Axes
x = np.arange(100)
y = np.arange(100)
#Some Interesting Shape
z = np.array(np.outer(np.sin((x+y)/10),np.sin(y/3)),dtype=float)

# some mask
mask = np.outer(np.sin((x+y)/20),np.sin(y/5))**2>.9

# masked data represent noise, so lets put in some trash into the masked points
z[mask] = (np.random.random(size = (100,100))*10)[mask]

# masked data
z_masked = np.ma.masked_array(z, mask)

# "Conventional" filter
filter_kernelsize = 2

import scipy.ndimage
z_filtered_bad = scipy.ndimage.gaussian_filter(z_masked,filter_kernelsize)

# Lets filter it
import astropy.convolution.convolve
from astropy.convolution import Gaussian2DKernel
k = Gaussian2DKernel(1.5)

z_filtered = astropy.convolution.convolve(z_masked, k, boundary='extend')

### Plots:
fig, axes = plt.subplots(2,2)
plt.sca(axes[0,0])
plt.title('Raw Data')
plt.imshow(z)
plt.colorbar()

plt.sca(axes[0,1])
plt.title('Raw Data Masked')
plt.imshow(z_masked)
plt.colorbar()

plt.sca(axes[1,0])
plt.title('ndimage filter (ignores mask)')
plt.imshow(z_filtered_bad)
plt.colorbar()

plt.sca(axes[1,1])
plt.title('astropy filter (uses mask)')
plt.imshow(z_filtered)
plt.colorbar()

plt.tight_layout()

Output plot of the code

want to smooth a contour from a masked array

3 Answers3

Linked