0

I have been trying to work out how to replicate IDL's smooth function in Python and I just can't get anything like the same results. (Disclaimer: It is probably 10 years since I touched this kind of mathematical problem so it has been dumped to make way for information like where to find the cheapest local fuel). I am trying to code this:

smooth(b,w,/nan)

where b is a 2D float array containing NANs (zeros - missing data - have also been converted to NAN).

From the IDL documents, it appears smooth uses a boxcar, so from scipy.ndimage.filters I have tried:

bsmooth = uniform_filter(b, w)

I am aware that there are some fundamental differences here:

  1. the default edge behaviour from IDL is "the end points are copied from the original array to the result with no smoothing" whereas I don't seem to have the option to do this with the uniform filter.
  2. Treatment of the NaN elements. In IDL, the /nan keyword seems to mean that where possible the NaN values will be filled by the result of the other points in the window. If there are no valid points to generate a result, by a MISSING keyword. I thought I could approximate this behaviour following the smoothing using scipy.interpolate's NearestNDInterpolator (thanks to the brilliant explanation by Alex on here: filling gaps on an image using numpy and scipy)

Here is my test array:

 >>>b                                                                                                                array([[ 0.97599638,  0.93114936,  0.87070072,  0.5379253 ],                                                              
       [ 0.34873217,         nan,  0.40985891,  0.22407863],                                                              
       [        nan,         nan,         nan,  0.67532134],                                                              
       [        nan,         nan,  0.85441768,         nan]])  

My answers bore not the SLIGHTEST resemblance to IDL, whether I use the /nan keyword or not.

IDL> smooth(b,2,/nan)
      0.97599638      0.93114936      0.87070072      0.53792530
      0.34873217      0.70728749      0.60817236      0.22407863
             NaN      0.53766960      0.54091913      0.67532134
             NaN             NaN      0.85441768             NaN

IDL> smooth(b,2)
      0.97599638      0.93114936      0.87070072      0.53792530
      0.34873217            -NaN            -NaN      0.22407863
            -NaN            -NaN            -NaN      0.67532134
            -NaN            -NaN      0.85441768             NaN

I confess I find the scipy documentation rather sparse on detail so I have no idea if I am really doing what I think I doing. The fact that the two python approaches which I believed would both smooth the image give different answers suggests that things are not what I understood them to be.

>>>uniform_filter(b, 2)
array([[ 0.97599638,  0.95357287,  0.90092504,  0.70431301],
       [ 0.66236428,         nan,         nan,         nan],
       [        nan,         nan,         nan,         nan],
       [        nan,         nan,         nan,         nan]])    

I thought it was a bit odd it was so empty so I tried this with an array of 100 elements (still using a window of 2) and output the images. The results (first image is 'b' second is 'bsmooth') are not quite what I was hoping for:

![b

![bsmooth

Going back to the smaller array and following the examples in: http://scipy.github.io/old-wiki/pages/Cookbook/SignalSmooth which I thought would give the same output as uniform_filter, I tried:

>>> box = np.array([1,1,1,1])
>>> box = box.reshape(2,2)
>>> box
array([[1, 1],
       [1, 1]])
>>> bsmooth = scipy.signal.convolve2d(b,box,mode='same')
>>> print bsmooth
[[ 0.97599638  1.90714574  1.80185008  1.40862602]
 [ 1.32472855         nan         nan  2.04256356]
 [        nan         nan         nan         nan]
 [        nan         nan         nan         nan]]

Obviously I have completely misunderstood the scipy functions, maybe even the IDL one. If anyone can help me to replicate the IDL smooth function as closely as possible, I would be extremely grateful. I am under considerable time pressure to get a solution for this that doesn't rely on IDL and I am tossing a coin to decide whether to code the function from scratch or develop a very contagious illness.

How can I perform the same smoothing in python?

Community
  • 1
  • 1
emmalg
  • 346
  • 2
  • 12
  • This the IDL alg? http://northstar-www.dartmouth.edu/doc/idl/html_6.2/SMOOTH.html – Lee Mar 02 '16 at 18:56
  • http://stackoverflow.com/questions/30532216/boxcar-averaging-algorithm-of-the-specified-weight – Lee Mar 02 '16 at 19:00
  • That's the one. I don't know how I missed the question I've been searching and searching. Very helpful question though. – emmalg Mar 02 '16 at 21:26

2 Answers2

1

First: Please use matplotlib.pyplot.imshow with interpolation="none" that's nicer to look at and maybe with greyscale.

So for your example: There is actually no convolution (filter) within scipy and numpy that treat's NaN as missing values (they propagate them within the convolution). At least I've found none so far and your boundary-treatement is also (to my knowledge) not implemented. But the boundary could be just replaced afterwards.

If you want to do convolution with NaN you can for example use astropy.convolution.convolve. There NaNs are interpolated using the kernel of your filter. But their convolution has some drawbacks as well: Border handling like you want isn't implemented there neither and your kernel must be of odd shape and the sum of your kernel must not be zero (or very close to it)

For example:

from astropy.convolution import convolve
import numpy as np
array = np.random.uniform(10,100, (4,4))
array[1,1] = np.nan
kernel = np.ones((3,3))
convolve(array, kernel)

as an example an initial array of

array([[ 97.19514587,  62.36979751,  93.54811286,  30.23567842],
       [ 51.02184613,          nan,  46.14769821,  60.08088041],
       [ 20.86482452,  42.39661484,  36.96961278,  96.89180175],
       [ 45.54453509,  76.61274347,  46.44485141,  25.40985372]])

will become:

array([[ 266.9009961 ,  406.59680717,  348.69637399,  230.01236989],
       [ 330.16243546,  506.82785931,  524.95440336,  363.87378443],
       [ 292.75477064,  422.31693304,  487.26826319,  311.94469828],
       [ 185.41871792,  268.83318211,  324.72547798,  205.71611967]])

if you want to "normalize" it, astropy offers the normalize_kernel parameter:

convolved = convolve(array, kernel, normalize_kernel=True)
array([[ 29.58753936,  42.09982189,  49.31793529,  33.00203873],
       [ 49.87040638,  65.67695002,  66.10447436,  40.44026448],
       [ 52.51126383,  63.03914444,  60.85474739,  35.88011742],
       [ 39.40188443,  46.82350749,  40.1380926 ,  22.46090152]])

If you want to replace the "edge" values with the ones from the original array just replace them:

convolved[0,:] = array[0,:]
convolved[-1,:] = array[-1,:]
convolved[:,0] = array[:,0]
convolved[:,-1] = array[:,-1]

So that's what the existing packages offer (as far as I know it). If you want to learn a bit of Cython or numba you can easily write your own convolutions that is not much slower (only a factor of 2-10) than the numpy/scipy ones but does EXACTLY what you want without messing around.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • Thanks, I've finally worked out how the IDL smooth routine is working and it would give a different value compared to the normalised one (~56.314 where the nan value is). I'm just embarking on writing it myself so will post it when done. – emmalg Mar 03 '16 at 11:32
0

Since this is not something that is available in the python packages and because I saw the question asked several times during my research without satisfactory answers, here is how I solved the issue.

Provided is a test version of my function that I'm off to tidy up. I am sure there will be better ways to do the things I have done as I'm still fairly new to Python - please do recommend any appropriate changes.

Plots use autumn colourmap just because it allowed me to see the NaNs clearly.

My results:

IDL propagate
     0.033369284     0.067915268      0.96602046      0.85623550
      0.30435592             NaN             NaN       100.00000
      0.94065958             NaN             NaN      0.90966976
     0.018516513     0.044460904     0.051047217             NaN

python propagate
[[  3.33692829e-02   6.79152655e-02   9.66020487e-01   8.56235492e-01]
 [  3.04355923e-01              nan              nan   1.00000000e+02]
 [  9.40659566e-01              nan              nan   9.09669768e-01]
 [  1.85165123e-02   4.44609040e-02   5.10472165e-02              nan]]


IDL replace
     0.033369284     0.067915268      0.96602046      0.85623550
      0.30435592      0.47452110       14.829881       100.00000
      0.94065958      0.33833817       17.002417      0.90966976
     0.018516513     0.044460904     0.051047217             NaN

python replace
[[  3.33692829e-02   6.79152655e-02   9.66020487e-01   8.56235492e-01]
 [  3.04355923e-01   4.74521092e-01   1.48298812e+01   1.00000000e+02]
 [  9.40659566e-01   3.38338177e-01   1.70024175e+01   9.09669768e-01]
 [  1.85165123e-02   4.44609040e-02   5.10472165e-02              nan]]

My function:

#!/usr/bin/env python

# smooth.py  
__version__ = 0.1

# Version 0.1    29 Feb 2016    ELH    Test release

import numpy as np
import matplotlib.pyplot as mp

def Smooth(v1, w, nanopt): 

    # v1 is the input 2D numpy array.
    # w is the width of the square window along one dimension
    # nanopt can be replace or propagate 

    '''
    v1 = np.array(
    [[3.33692829e-02, 6.79152655e-02, 9.66020487e-01, 8.56235492e-01], 
    [3.04355923e-01, np.nan        , 4.86013025e-01, 1.00000000e+02],
    [9.40659566e-01, 5.23314093e-01, np.nan        , 9.09669768e-01],
    [1.85165123e-02, 4.44609040e-02, 5.10472165e-02, np.nan ]])
    w = 2
    '''

    mp.imshow(v1, interpolation='None', cmap='autumn')
    mp.show()

    # make a copy of the array for the output:
    vout=np.copy(v1)

    # If w is even, add one
    if w % 2 == 0:
        w = w + 1

    # get the size of each dim of the input:
    r,c = v1.shape

    # Assume that w, the width of the window is always square.
    startrc = (w - 1)/2
    stopr = r - ((w + 1)/2) + 1
    stopc = c - ((w + 1)/2) + 1

    # For all pixels within the border defined by the box size, calculate the average in the window.
    # There are two options:
    # Ignore NaNs and replace the value where possible.
    # Propagate the NaNs

    for col in range(startrc,stopc):
        # Calculate the window start and stop columns
        startwc = col - (w/2) 
        stopwc = col + (w/2) + 1
        for row in range (startrc,stopr):
            # Calculate the window start and stop rows
            startwr = row - (w/2)
            stopwr = row + (w/2) + 1
            # Extract the window
            window = v1[startwr:stopwr, startwc:stopwc]
            if nanopt == 'replace':
                # If we're replacing Nans, then select only the finite elements
                window = window[np.isfinite(window)]
            # Calculate the mean of the window
            vout[row,col] = np.mean(window)

    mp.imshow(vout, interpolation='None', cmap='autumn')
    mp.show()

    return vout
emmalg
  • 346
  • 2
  • 12