1

I have a MaskedArray a of shape (L,M,N), and I want to transfer the unmasked elements to a normal array b (with the same shape), such that along the last dimension, the first elements receive the non-masked values, and the remaining elements are zero. For example, in 2D:

a = [[--,  1,  2, --,  7, --,  5],
     [3 , --, --,  2, --, --, --]]

# Transfer to:

b = [[1, 2, 7, 5, 0, 0, 0],
     [3, 2, 0, 0, 0, 0, 0]]

The simplest way to do this would be over a for loop, e.g.,

for idx in np.ndindex(np.shape(a)[:-1]):
   num = a[idx].count()
   b[idx][:num] = a[idx].compressed()
   # or perhaps,
   # b[idx][:num] = a[idx][~a[idx].mask]

But this will be very slow for large arrays (and in fact, I have many different arrays with the same mask values, all of which I'd like to convert in the same way). Is there a fancy slicing way to do this?


Edit: Here is one way to construct the appropriate indexing tuple to assign value, but it seems ugly. Perhaps there's something better?

b = np.zeros(x.shape)
# Construct a list with a list for each dimension.
left = [[] for ii in range(a.ndim)]
# In each sub-list, construct the indices to `b` to store each value from `a`
for idx in np.ndindex(a.shape[:-1]):
    num = a[idx].count()
    # here `ii` is the dimension number, and jj the index in that dimension
    for ii, jj in enumerate(idx):
        left[ii] = left[ii] + num*[jj]
        right[ii] = right[ii] + num*[jj]
    # The last dimension is just consecutive numbers for as many values
    left[-1] = left[-1] + list(range(num))

a[left] = b[~b.mask]
DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • Padding an irregular list of arrays, and moving all 0s to the end of the row, are similar problems that have been discussed on SO. You'll need at least one loop to get the number of unmasked (or masked) elements per row. – hpaulj Sep 22 '17 at 02:22
  • A pad with `nan` (or 0s) question with several good answers: https://stackoverflow.com/questions/40569220/efficiently-convert-uneven-list-of-lists-to-minimal-containing-array-padded-with – hpaulj Sep 22 '17 at 02:36

1 Answers1

2

Adapting @divakar's answers from the linked 'pad with 0s' questions,

Convert Python sequence to NumPy array, filling missing values

In [464]: a=np.array([[0,1,2,0,7,0,5],[3,0,0,2,0,0,0]])
In [465]: Ma = np.ma.masked_equal(a, 0)
In [466]: Ma
Out[466]: 
masked_array(data =
 [[-- 1 2 -- 7 -- 5]
 [3 -- -- 2 -- -- --]],
             mask =
 [[ True False False  True False  True False]
 [False  True  True False  True  True  True]],
       fill_value = 0)

Getting the number of 0s we need to pad with is easy here - just sum the mask Trues

In [467]: cnt=Ma.mask.sum(axis=1)  # also np.ma.count_masked(Ma,1)
In [468]: cnt
Out[468]: array([3, 5])
In [469]: 
In [469]: mask=(7-cnt[:,None])>np.arange(7) # key non intuitive step
In [470]: mask
Out[470]: 
array([[ True,  True,  True,  True, False, False, False],
       [ True,  True, False, False, False, False, False]], dtype=bool)

The mask is constructed such that the first elements cnt elements (along each dim-0 axis) are True, the rest are False.

Now just use this mask to copy the compressed values to a blank array:

In [471]: M=np.zeros((2,7),int)
In [472]: M[mask]=Ma.compressed()
In [473]: M
Out[473]: 
array([[1, 2, 7, 5, 0, 0, 0],
       [3, 2, 0, 0, 0, 0, 0]])

I had to fiddle around with the cnt and np.arange(7) to get the desired mix of True/False values (left justified Trues).

Count unmasked values per row:

In [486]: np.ma.count(Ma,1)
Out[486]: array([4, 2])

Generalizing this to N-dimensions:

def compress_masked_array(vals, axis=-1, fill=0.0):
    cnt = vals.mask.sum(axis=axis)
    shp = vals.shape
    num = shp[axis]
    mask = (num - cnt[..., np.newaxis]) > np.arange(num)
    n = fill * np.ones(shp)
    n[mask] = vals.compressed()
    return n
DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
hpaulj
  • 221,503
  • 14
  • 230
  • 353