0

I have some array

a = np.array([1, 2, 3])

and some mask

mask = np.ones(a.shape, dtype=bool)

and can do

np.testing.assert_almost_equal(a[mask], a)  # True

However,

np.ma.array(a, mask)

is equivalent to

a[np.logical_not(mask)]

and

np.ma.array(a, np.logical_not(mask))

is equivalent to

a[mask]

This seems counter intuitive to me.

Would love an explanation for this design choice by numpy.

Gulzar
  • 23,452
  • 27
  • 113
  • 201
  • Your mask is an array of integers though, not booleans – yatu May 28 '20 at 13:03
  • @yatu thanks, fixed. Feel free to edit inaccuracies :) i hope the question itself is clear. – Gulzar May 28 '20 at 13:12
  • Interesting question, but I feel like that's quite literally the definition of a masked array. ["Masked values of True exclude the corresponding element from any computation."](https://numpy.org/doc/stable/reference/generated/numpy.ma.array.html). Maybe [this answer](https://stackoverflow.com/questions/55987642/why-are-numpy-masked-arrays-useful) resolves it? – Bertil Johannes Ipsen May 28 '20 at 15:09
  • Are you asking why the code works the way it does (given how it's documented), or why the developers chose this particular convention (in contrast to what your intuition says it should be)? – hpaulj May 28 '20 at 15:32
  • @hpauli i actually don't understand the difference in what you stated. I asked why np.ma uses a different convention than np. My intuition is that they should be the same, i don't mind which way. – Gulzar May 28 '20 at 15:52

1 Answers1

0
In [6]: a = np.array([1,2,3])                                                            
In [7]: idx = np.array([1,0,1], bool)                                                    
In [8]: idx                                                                              
Out[8]: array([ True, False,  True])
In [9]: a[idx]                                                                           
Out[9]: array([1, 3])

Just because you called a boolean array mask, does not mean it behaves as 'mask' in every sense of the word. I intentionally choose a different name. Yes, we do often call such an array mask and talk of 'masking', but what we are really doing is 'selecting'. The a[idx] operations returns the elements of a where the idx is True. It's the same as indexing with the nonzero tuple:

In [13]: np.nonzero(idx)                                                                 
Out[13]: (array([0, 2]),)

In np.ma mask is used in the sense of 'mask out', covering over.

In [10]: mm = np.ma.masked_array(a, mask=idx)                                            
In [11]: mm                                                                              
Out[11]: 
masked_array(data=[--, 2, --],
             mask=[ True, False,  True],
       fill_value=999999)
In [12]: mm.compressed()                                                                 
Out[12]: array([2])

In the display the masked values show up as '--'. As the np.ma docs say, those elements a considered to be invalid, and will be excluded from computations.

mm.filled returns an array with the 'masked' value replaced by the 'fill':

In [16]: mm.filled()                                                                     
Out[16]: array([999999,      2, 999999])

we can do the same thing with idx:

In [17]: a[idx] = 999999                                                                 
In [18]: a                                                                               
Out[18]: array([999999,      2, 999999])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks for the reply! I asked *why* the different standard for np and np.ma, i understand the differences you specified here. – Gulzar May 28 '20 at 22:50
  • Doesn't really matter what I call the array with which i access the data array, i would expect it to behave the same, and it doesn't. I see that the docs specify the behavior, and i understand the behavior. I do not understand the logic that led to the choosing of the mask indicating "what not to select", as opposed to the numpy standard which is "what to select" – Gulzar May 28 '20 at 22:52
  • What I'm trying explain is that the two uses of a boolean array are unrelated. Selecting and masking are different tasks, and don't have to follow the same logic. – hpaulj May 29 '20 at 04:19
  • Allright, is there an efficient way without applying `np.logical_not` or `~` to get a masked array from the already existing (inverted) mask? Without traversing the entire mask for negation first? – Gulzar May 29 '20 at 16:02
  • Why are you afraid of a simple `~`? Don't use `np.ma` if you are seeking efficiency; it's a convenience class, not a performance one. I don't think it has any custom c code; it's all Python. – hpaulj May 29 '20 at 16:16