Comparing NumPy arrays so that NaNs yield NaNs

Question

import numpy as np

I have two arrays:

a = np.array([1,2,3,np.nan,5])
b = np.array([3,2,np.nan,5,4])

I would like to compare the elements in these two arrays & get a list of booleans as result. When there is nan involved in the comparison, I'd like to get nan. Expected result:

[False, False, nan, nan, True]

I have achieved the desired output using an if-else involving list comprehension:

[eacha>eachb
 if ~np.isnan(eacha) and ~np.isnan(eachb)
 else np.nan
 for eacha, eachb in zip(a,b)]

Is there a better (ie not involving for loop, if-else statements) way of doing this?

`np.where(np.isnan(a * b), np.nan, a==b)` should be faster than the given answer since it avoids an unnecessary third operation. Since `NaN` times anything equals NaN, we can use multiplication rather than an `or` at the end. — user3483203, Mar 05 '21 at 19:06
If you only care about truthy vs falsey in your final result, you could speed this up even more with `(a * b) * (a == b)` — user3483203, Mar 05 '21 at 19:13
Could you elaborate in this last one? I would like to have True-s and False-s in my array, not floats. — zabop, Mar 05 '21 at 19:26
@zabop if you want `nans` then **you must use a float dtype** — juanpa.arrivillaga, Mar 05 '21 at 20:11
Why the datatype of the nan has to be imposed over the whole array, why can't I have float nans and boolean Trues Falses? — zabop, Mar 05 '21 at 20:55

score 4 · Answer 1 · answered Mar 05 '21 at 18:30

4

You can try:

np.where(np.isnan(a)|np.isnan(b), np.nan, a==b)

But then you get a float array, since np.nan is float:

array([ 0.,  1., nan, nan,  0.])

answered Mar 05 '21 at 18:30

Quang Hoang

146,074
10
56
74

Thank you. If you can suggest a way to make 0.s False and 1.s True, that would be awesome (added such a way, but apparently that's not that great). – zabop Mar 05 '21 at 19:28

zabop · Answer 2 · 2021-03-05T19:27:01.847

1

To change Quang Hoang's excellent answer's output from floats to booleans, we can use pandas.Series.replace:

pd.Series(np.where(np.isnan(a)|np.isnan(b), np.nan, a==b)).replace({0:False,1:True}).to_numpy()

resulting in:

0    False
1     True
2      NaN
3      NaN
4    False
dtype: object

or:

pd.Series(np.where(np.isnan(a)|np.isnan(b), np.nan, a==b)).replace({0:False,1:True}).to_numpy()

resulting in:

array([False, True, nan, nan, False], dtype=object)

edited Mar 05 '21 at 19:27

answered Mar 05 '21 at 18:42

zabop

6,750
3
39
84

2

Not my downvote, but this is *not* an improvement at all. – user3483203 Mar 05 '21 at 19:05
Thank you. Edited this post so now improvement is not mentioned, just that the result is boolean this way, not floats. – zabop Mar 05 '21 at 19:27
1

@zabop no, the result id `dtype=object` which is generally just bad. You lose the advantages of `numpy` that way. This is all probably not an improvement over just using regular python lists and loops. A `dtype=object` numpy array is basically a less performant python list. I didn't downvote, btw, but this might explain the general aversion to this answer – juanpa.arrivillaga Mar 05 '21 at 20:12
In my case, performace is not an issue. Readability of the dataframes I am creating, is. That is why I need Trues and Falses, not floats. Thank you for the comment, it is useful. – zabop Mar 05 '21 at 20:59

Comparing NumPy arrays so that NaNs yield NaNs

2 Answers2