Lets try an example that produces a ragged result - different number of 'masked' values in each row.
In [292]: a=np.arange(12).reshape(3,4)
In [293]: a
Out[293]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [294]: a<6
Out[294]:
array([[ True, True, True, True],
[ True, True, False, False],
[False, False, False, False]], dtype=bool)
The flattened list of values that match this condition. It can't return a regular 2d array, so it has to revert to a flattened array.
In [295]: a[a<6]
Out[295]: array([0, 1, 2, 3, 4, 5])
do the same thing, but iterating row by row
In [296]: [a1[a1<6] for a1 in a]
Out[296]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)]
Trying to make an array of the result produces an object type array, which is little more than a list in an array wrapper:
In [297]: np.array([a1[a1<6] for a1 in a])
Out[297]: array([array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)], dtype=object)
The fact that the result is ragged is a good indicator that it is difficult, if not impossible, to perform that action with one vectorized operation.
Here's another way of producing the list of arrays. With sum
I find how many elements there are in each row, and then use this to split
the flattened array into sublists.
In [320]: idx=(a<6).sum(1).cumsum()[:-1]
In [321]: idx
Out[321]: array([4, 6], dtype=int32)
In [322]: np.split(a[a<6], idx)
Out[322]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=float64)]
It does use 'flattening'. And for these small examples it is slower than the row iteration. (Don't worry about the empty float array, split
had to construct something and used a default dtype. )
A different mask, without empty rows clearly shows the equivalence of the 2 approaches.
In [344]: mask=np.tri(3,4,dtype=bool) # lower tri
In [345]: mask
Out[345]:
array([[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]], dtype=bool)
In [346]: idx=mask.sum(1).cumsum()[:-1]
In [347]: idx
Out[347]: array([1, 3], dtype=int32)
In [348]: [a1[m] for a1,m in zip(a,mask)]
Out[348]: [array([0]), array([4, 5]), array([ 8, 9, 10])]
In [349]: np.split(a[mask],idx)
Out[349]: [array([0]), array([4, 5]), array([ 8, 9, 10])]