Find consecutive unmasked values

Question

I have a large 3 dimensional (time, longitude, latitude) input array. Most of the entries are masked. I need to find those entries where the mask is False for longer than a specific number of consecutive time steps (which I call threshold here). The result should be a mask with the same shape as the input mask.

Here is some pseudo-code to hopefully make clearer what I mean:

new_mask = find_consecutive(mask, threshold=3)
mask[:, i_lon, i_lat]
# [1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0]
new_mask[:, i_lon, i_lat]
# [1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]

EDIT:

I'm not sure if my approach so far make sense. It does good performance-wise and gives me a labeled array and knowledge about which labels I want. I just couldn't figure out an efficient way to transform labels into a mask again.

from scipy.ndimage import measurements
structure = np.zeros((3, 3, 3))
structure[:, 1, 1] = 1
labels, nr_labels = measurements.label(1 - mask, structure=structure)
_, counts = np.unique(labels, return_counts=True)
labels_selected = [i_count for i_count, count in enumerate(counts)
                   if count >= threshold]

... and what have you tried so far? – plonser Nov 24 '15 at 12:15 — plonser, Nov 24 '15 at 12:15

Divakar · Accepted Answer · 2015-11-24T15:40:29.673

That's a classical case of binary closing operation in image-processing. To solve it you can take help from scipy module, specifically - scipy.ndimage.morphology.binary_closing after we feed an appropriate 1D kernel of all ONES and of length threshold. Also, Scipy's binary closing function gives us the closed mask only. So, to get the desired output, we need to OR it with the input mask. Thus, the implementation would look something like this -

from scipy.ndimage import binary_closing
out = mask | binary_closing(mask, structure=np.ones(threshold))

How about a NumPy version of binary closing!

Now, a closing operation is basically image-dilation and image-erosion, so we can simulate that behiviour using the trusty convolution operation and we do have that here in NumPy as np.convolve. Similar to scipy's binary closing operation, we need the same kernel here as well and we would use it both for dilation and erosion. The implementation would be -

def numpy_binary_closing(mask,threshold):

    # Define kernel
    K = np.ones(threshold)

    # Perform dilation and threshold at 1
    dil = np.convolve(mask,K,mode='same')>=1

    # Perform erosion on the dilated mask array and threshold at given threshold
    dil_erd = np.convolve(dil,K,mode='same')>= threshold
    return dil_erd

Sample run -

In [133]: mask
Out[133]: 
array([ True, False, False, False, False,  True,  True, False, False,
        True, False], dtype=bool)

In [134]: threshold = 3

In [135]: binary_closing(mask, structure=np.ones(threshold))
Out[135]: 
array([False, False, False, False, False,  True,  True,  True,  True,
        True, False], dtype=bool)

In [136]: numpy_binary_closing(mask,threshold)
Out[136]: 
array([False, False, False, False, False,  True,  True,  True,  True,
        True, False], dtype=bool)

In [137]: mask | binary_closing(mask, structure=np.ones(threshold))
Out[137]: 
array([ True, False, False, False, False,  True,  True,  True,  True,
        True, False], dtype=bool)

In [138]: mask| numpy_binary_closing(mask,threshold)
Out[138]: 
array([ True, False, False, False, False,  True,  True,  True,  True,
        True, False], dtype=bool)

Runtime tests (Scipy vs Numpy!)

Case #1 : Uniformly sparse

In [163]: mask = np.random.rand(10000) > 0.5

In [164]: threshold = 3

In [165]: %timeit binary_closing(mask, structure=np.ones(threshold))
1000 loops, best of 3: 582 µs per loop

In [166]: %timeit numpy_binary_closing(mask,threshold)
10000 loops, best of 3: 178 µs per loop

In [167]: out1 = binary_closing(mask, structure=np.ones(threshold))

In [168]: out2 = numpy_binary_closing(mask,threshold)

In [169]: np.allclose(out1,out2) # Verify outputs
Out[169]: True

Case #2 : More sparse and bigger threshold

In [176]: mask = np.random.rand(10000) > 0.8

In [177]: threshold = 11

In [178]: %timeit binary_closing(mask, structure=np.ones(threshold))
1000 loops, best of 3: 823 µs per loop

In [179]: %timeit numpy_binary_closing(mask,threshold)
1000 loops, best of 3: 331 µs per loop

In [180]: out1 = binary_closing(mask, structure=np.ones(threshold))

In [181]: out2 = numpy_binary_closing(mask,threshold)

In [182]: np.allclose(out1,out2) # Verify outputs
Out[182]: True

Winner is Numpy and by a big margin!

Boundary conditions

It seems the boundaries need the closing too, if the 1s are close enough to the bounadries. To solve those cases, you can pad one 1 each at the start and end of the input boolean array, use the posted code and then at the end de-select the first and last element. Thus, the complete implementation using scipy's binary_closing approach would be -

mask_ext = np.pad(mask,1,'constant',constant_values=(1))
out = mask_ext | binary_closing(mask_ext, structure=np.ones(threshold))
out = out[1:-1]

Sample run -

In [369]: mask
Out[369]: 
array([False, False,  True, False, False, False, False,  True,  True,
       False, False,  True, False], dtype=bool)

In [370]: threshold = 3

In [371]: mask_ext = np.pad(mask,1,'constant',constant_values=(1))
     ...: out = mask_ext | binary_closing(mask_ext, structure=np.ones(threshold))
     ...: out = out[1:-1]
     ...: 

In [372]: out
Out[372]: 
array([ True,  True,  True, False, False, False, False,  True,  True,
        True,  True,  True,  True], dtype=bool)

The last value in both examples is wrong at least in the sense I mean my question. It should be masked since it is only 1 count. Then I do have to admit that I don't really understand how your `numpy_binary_closing` works... Is it possible to adapt it so it works also with 3D arrays? — Lukas, Nov 24 '15 at 14:16
@LukasBrunner Ah didn't notice that trailing `1` there. I might have to change few things in the solution. So, trying to understand the question again, let's say mask = `[1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0]`, what must be the output then? — Divakar, Nov 24 '15 at 14:24
That would be `[1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]`. That is here if `threshold` is either 2, 3 or 4 hence demanding at least 2, 3 or 4 consecutive zeros in order to leave them unchanged. — Lukas, Nov 24 '15 at 14:29
@LukasBrunner And if mask = `[0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0]`, which is the same as the input listed in the question, but with two additional zeroes at the start and again assuming `threshold = 3`, what must be the output? — Divakar, Nov 24 '15 at 14:51
That would be `[1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]` then. — Lukas, Nov 24 '15 at 14:57
@LukasBrunner Check out the edits for solving your 1D array cases. — Divakar, Nov 24 '15 at 15:40

score 1 · Answer 2 · answered Nov 26 '15 at 15:38

Just for the sake of completeness here also the solution for the approach I outlined in my EDIT. It does way worse than both of Divakars solutions performance-wise (about a factor 10 compared to numpy_binary_closing) but allows handling of 3D arrays. In addition it offers the possibility to write out the positions of the clusters (wasn't part of the question but it can be interesting information)

import numpy as np
from scipy.ndimage import measurements

def select_consecutive(mask, threshold):

    structure = np.zeros((3, 3, 3))
    structure[:, 1, 1] = 1
    labels, _ = measurements.label(1 - mask, structure=structure)

    # find positions of all unmasked values
    # object_slices = measurements.find_objects(labels)

    _, counts = np.unique(labels, return_counts=True)
    labels_selected = [i_count for i_count, count in enumerate(counts)
                       if count >= threshold and i_count != 0]
    ind = np.in1d(labels.flatten(), labels_selected).reshape(mask.shape)

    mask_new = np.ones_like(mask)
    mask_new[ind] = 0

    return mask_new

Find consecutive unmasked values

2 Answers2

Linked

Related