1
import numpy as np

I have two arrays:

a = np.array([1,2,3,np.nan,5])
b = np.array([3,2,np.nan,5,4])

I would like to compare the elements in these two arrays & get a list of booleans as result. When there is nan involved in the comparison, I'd like to get nan. Expected result:

[False, False, nan, nan, True]

I have achieved the desired output using an if-else involving list comprehension:

[eacha>eachb
 if ~np.isnan(eacha) and ~np.isnan(eachb)
 else np.nan
 for eacha, eachb in zip(a,b)]

Is there a better (ie not involving for loop, if-else statements) way of doing this?

zabop
  • 6,750
  • 3
  • 39
  • 84
  • `np.where(np.isnan(a * b), np.nan, a==b)` should be faster than the given answer since it avoids an unnecessary third operation. Since `NaN` times anything equals NaN, we can use multiplication rather than an `or` at the end. – user3483203 Mar 05 '21 at 19:06
  • If you only care about truthy vs falsey in your final result, you could speed this up even more with `(a * b) * (a == b)` – user3483203 Mar 05 '21 at 19:13
  • Could you elaborate in this last one? I would like to have True-s and False-s in my array, not floats. – zabop Mar 05 '21 at 19:26
  • @zabop if you want `nans` then **you must use a float dtype** – juanpa.arrivillaga Mar 05 '21 at 20:11
  • Why the datatype of the nan has to be imposed over the whole array, why can't I have float nans and boolean Trues Falses? – zabop Mar 05 '21 at 20:55

2 Answers2

4

You can try:

np.where(np.isnan(a)|np.isnan(b), np.nan, a==b)

But then you get a float array, since np.nan is float:

array([ 0.,  1., nan, nan,  0.])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thank you. If you can suggest a way to make 0.s False and 1.s True, that would be awesome (added such a way, but apparently that's not that great). – zabop Mar 05 '21 at 19:28
1

To change Quang Hoang's excellent answer's output from floats to booleans, we can use pandas.Series.replace:

pd.Series(np.where(np.isnan(a)|np.isnan(b), np.nan, a==b)).replace({0:False,1:True}).to_numpy()

resulting in:

0    False
1     True
2      NaN
3      NaN
4    False
dtype: object

or:

pd.Series(np.where(np.isnan(a)|np.isnan(b), np.nan, a==b)).replace({0:False,1:True}).to_numpy()

resulting in:

array([False, True, nan, nan, False], dtype=object)
zabop
  • 6,750
  • 3
  • 39
  • 84
  • 2
    Not my downvote, but this is *not* an improvement at all. – user3483203 Mar 05 '21 at 19:05
  • Thank you. Edited this post so now improvement is not mentioned, just that the result is boolean this way, not floats. – zabop Mar 05 '21 at 19:27
  • 1
    @zabop no, the result id `dtype=object` which is generally just bad. You lose the advantages of `numpy` that way. This is all probably not an improvement over just using regular python lists and loops. A `dtype=object` numpy array is basically a less performant python list. I didn't downvote, btw, but this might explain the general aversion to this answer – juanpa.arrivillaga Mar 05 '21 at 20:12
  • In my case, performace is not an issue. Readability of the dataframes I am creating, is. That is why I need Trues and Falses, not floats. Thank you for the comment, it is useful. – zabop Mar 05 '21 at 20:59