Python/NumPy first occurrence of masked subarray

Question

I would like to find the occurrences of a subarray in a numpy array, but with a "wildcard".

a = np.array([1, 2, 3, 4, 5])
b = np.ma.array([2, 99, 4], mask=[0, 1, 0])

The idea is that searching for b in a gives a match because 99 is masked.

More specifically, I hoped that the method described here would work, but it does not:

def rolling_window(a, size):
    shape = a.shape[:-1] + (a.shape[-1] - size + 1, size)
    strides = a.strides + (a. strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 3, 4])
c = np.ma.array([2, 99, 4], mask=[0, 1, 0])

workingMatch = rolling_window(a, len(b)) == b
notWorkingMatch = rolling_window(a, len(c)) == c

this results in

>>> workingMatch
array([[False, False, False],
       [ True,  True,  True],
       [False, False, False]], dtype=bool)

>>> notWorkingMatch
masked_array(data = [[False False False]
                     [-- -- --]
                     [False False False]],
             mask = [False  True False], fill_value = True)

...so no match is found. Why not? (I'd like to learn something) How to make this work?

hpaulj · Accepted Answer · 2016-08-24T05:54:13.693

Use np.ma.equal instead of == - see end.

========================

A masked array consists of a data array and a mask array. Often the masked array is used in other operations by 'filling' the masked values with something innocuous, or by compressing them out. I'm not entirely sure what's going on with this == test, but let's look at the calculations.

Your striding produces an array:

In [614]: A
Out[614]: 
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5]])

In [615]: b
Out[615]: array([2, 3, 4])

In [612]: A==b
Out[612]: 
array([[False, False, False],
       [ True,  True,  True],
       [False, False, False]], dtype=bool)

The masked array has data and mask

In [616]: c
Out[616]: 
masked_array(data = [2 -- 4],
             mask = [False  True False],
       fill_value = 999999)
In [617]: c.data
Out[617]: array([ 2, 99,  4])
In [618]: c.mask
Out[618]: array([False,  True, False], dtype=bool)
In [619]: (A==c).data
Out[619]: 
array([[False, False, False],
       [ True, False,  True],
       [False, False, False]], dtype=bool)

This data is we'd expect from A==c.data. The center 99 does not match.

But it looks like the mask is applied to the whole boolean array as though c where a column array - it's masking the 2nd row, rather than the 2nd column.

In [624]: A==c
Out[624]: 
masked_array(data =
 [[False False False]
 [-- -- --]
 [False False False]],
             mask =
 [False  True False],
       fill_value = True)

My first impression is that that is an error. But I'll have to dig more.

The data of A==c is 2d, but the mask is 1d.

If I replicated c to 3 rows, then I get the desired results:

In [638]: c[None,:]*np.array([1,1,1])[:,None]
Out[638]: 
masked_array(data =
 [[2 -- 4]
 [2 -- 4]
 [2 -- 4]],
             mask =
 [[False  True False]
 [False  True False]
 [False  True False]],
       fill_value = 999999)
In [639]: c1=c[None,:]*np.array([1,1,1])[:,None]
In [640]: A==c1
Out[640]: 
masked_array(data =
 [[False -- False]
 [True -- True]
 [False -- False]],
             mask =
 [[False  True False]
 [False  True False]
 [False  True False]],
       fill_value = True)
In [641]: (A==c1).all(axis=1)
Out[641]: 
masked_array(data = [False True False],
             mask = [False False False],
       fill_value = True)

I don't know if there's a cleaner way of doing this, but it indicates the direction such as solution needs to take.

============

np.ma.equal does what we want (== comparison with correct mask)

In [645]: np.ma.equal(A,c)
Out[645]: 
masked_array(data =
 [[False -- False]
 [True -- True]
 [False -- False]],
             mask =
 [[False  True False]
 [False  True False]
 [False  True False]],
       fill_value = 999999)
In [646]: np.ma.equal(A,c).any(axis=1)
Out[646]: 
masked_array(data = [False True False],
             mask = [False False False],
       fill_value = True)

np.ma.equal is a masked-aware version of np.equal, which a ufunc version of the element by element == operator.

Great! Thanks for this detailed answer. – Louic Aug 24 '16 at 07:01 — Louic, Aug 24 '16 at 07:01

Python/NumPy first occurrence of masked subarray

1 Answers1