Get the indices of the first and last rows and columns containing non-masked values in a numpy 2D array

Question

With a 2D masked array in Python, what would be the best way to get the index of the first and last rows and columns containing a non-masked value?

import numpy as np
a = np.reshape(range(30), (6,5))
amask = np.array([[True, True, False, True, True],
                  [True, False, False, True, True],
                  [True, True, True, False, True],
                  [True, False, False, False, True],
                  [True, True, True, False, True],
                  [True, True, True, True, True]])
a = np.ma.masked_array(a, amask)
print a
# [[-- -- 2 -- --]
#  [-- 6 7 -- --]
#  [-- -- -- 13 --]
#  [-- 16 17 18 --]
#  [-- -- -- 23 --]
#  [-- -- -- -- --]]

In this example, I would like to obtain:

(0, 4) for axis 0 (since the first row with unmasked value(s) is 0 and the last one is 4; the 6th row (row 5) only contains masked values)
(1, 3) for axis 1 (since the first column with unmasked value(s) is 1 and the last one is 3 (the 1st and 5th columns only contain masked values)).

[I thought about maybe combining numpy.ma.flatnotmasked_edges and numpy.apply_along_axis, without any success...]

Could you explain the expected output? – Divakar Sep 27 '18 at 08:30 — Divakar, Sep 27 '18 at 08:30
@Divakar I edited my question - hope it is clearer now. – ztl Sep 27 '18 at 08:35 — ztl, Sep 27 '18 at 08:35

Space Impact · Accepted Answer · 2018-09-27T10:18:12.617

1

IIUC you can do:

d = amask==False #First know which array values are masked
rows,columns = np.where(d) #Get the positions of row and column of masked values

rows.sort() #sort the row values
columns.sort() #sort the column values

print('Row values :',(rows[0],rows[-1])) #print the first and last rows
print('Column values :',(columns[0],columns[-1])) #print the first and last columns

Row values : (0, 4)
Column values : (1, 3)

Or

rows, columns = np.nonzero(~a.mask)
print('Row values :',(rows.min(), rows.max())) #print the min and max rows
print('Column values :',(columns.min(), columns.max())) #print the min and max columns

Row values : (0, 4)
Column values : (1, 3)

edited Sep 27 '18 at 10:18

answered Sep 27 '18 at 08:34

Space Impact

13,085
23
48

Thanks! IIUC, it would seems perhaps clearer and more straightfoward to me to do `rows, columns = np.nonzero(~a.mask)` and then `(rows.min(), rows.max())` and `(columns.min(), columns.max())`? I like the approach! – ztl Sep 27 '18 at 09:39
@ztl you can do that too. – Space Impact Sep 27 '18 at 09:40

Divakar · Answer 2 · 2018-09-27T09:01:44.300

1

Here's one based on argmax -

# Get mask for any data along axis=0,1 separately
m0 = a.all(axis=0)
m1 = a.all(axis=1)

# Use argmax to get first and last non-zero indices along axis=0,1 separately
axis0_out = m1.argmax(), a.shape[0] - m1[::-1].argmax() - 1
axis1_out = m0.argmax(), a.shape[1] - m0[::-1].argmax() - 1

edited Sep 27 '18 at 09:01

answered Sep 27 '18 at 08:49

Divakar

218,885
19
262
358

Get the indices of the first and last rows and columns containing non-masked values in a numpy 2D array

2 Answers2