9

Why does the following code return False?

>>> pd.Series([np.nan]) | pd.Series([True])
0    False
dtype: bool
Boann
  • 48,794
  • 16
  • 117
  • 146
JZ1
  • 149
  • 1
  • 6
  • Looks like a bug, since the commutative yield `True`. Should open an issue in their github. – rafaelc May 08 '20 at 21:01
  • This is interesting. Note, `np.nan or True` evaluates to `nan`, basically, `nan` will propagate in your operations. What is *super* weird is that *actually* `bool(np.nan)` will be `True`, and even more strangely, `pd.Series([np.nan],dtype=np.bool)` gives you a series with a single `True` – juanpa.arrivillaga May 08 '20 at 21:01
  • @juanpa.arrivillaga To make the story more interesting, `pd.NA` (as opposed to `np.nan`) does not propagate. – rafaelc May 08 '20 at 21:03
  • 3
    [Here](https://github.com/pandas-dev/pandas/issues/6528)'s a related discussion from pandas GitHub page. – ayhan May 08 '20 at 21:23
  • Funny indeed, as np.logical_or(np.nan, True) is True. – Roy2012 May 11 '20 at 08:57
  • Related thread here: https://stackoverflow.com/questions/37131462/comparing-logical-values-to-nan-in-pandas-numpy – Ji Wei May 22 '20 at 03:30

2 Answers2

4

I think this is because np.nan has metaclass of float and I guess overrides __bool__ to be non-zero:

np.nan.__bool__() == True

In the same way:

>>>np.nan or None
nan

A solution in pandas would be:

pd.Series([np.nan]).fillna(False) | pd.Series([True])

EDIT ***

For clarity, in pandas 0.24.1 in the method: _bool_method_SERIES on line 1816 of .../pandas/core/ops.py there is an assignment:

    fill_bool = lambda x: x.fillna(False).astype(bool)

which is where the behaviour you are describing is coming from. I.e. it's been purposefully designed so that np.nan is treated like a False value (whenever doing an or operation)

Reuben
  • 68
  • 7
  • *"...so that `np.nan` is treated like a `False` value (whenever doing an or operation)"* - **no**, `np.nan` is not treated as something different, try yourself `np.nan or True` and you will see that the result is `np.nan`. – MarianD May 25 '20 at 17:16
  • @MarianD - hey, I think I referenced that above; but my point is that `pandas` fills `np.nan` with `False` during `__or__` operations - hope that helps. – Reuben May 25 '20 at 21:18
  • **1.** Sorry, you referenced nothing (no links in your answer; BTW why version 0.24.1?). **2.** If — as you states — “padnas fill `np.nan` with `False`”, why `False or True` gives `False` (as in OP example)? – MarianD May 26 '20 at 07:55
  • BTW, you could be more specific directly in your answer. – MarianD May 26 '20 at 08:06
1

Compare your case (with the explicit dtype to emphasize the inferred one):

In[11]: pd.Series([np.nan], dtype=float) | pd.Series([True])
Out[11]: 
0    False
dtype: bool

with a similar one (only dtype is now bool):

In[12]: pd.Series([np.nan], dtype=bool) | pd.Series([True])
Out[12]: 
0    True
dtype: bool

Do you see the difference?


The explanation:

  1. In the first case (yours), np.nan propagates itself in the logical operation or (under the hood)

    In[13]: np.nan or True
    Out[13]: nan
    

    and pandas treated np.nan as False in the context of an boolean operation result.

     

  2. In the second case the output is unambiguous, as the first series has a boolean value (True, as all non-zero values are considered True, including np.nan, but it doesn't matter in this case):

    In[14]: pd.Series([np.nan], dtype=bool)
    
    Out[14]: 
    0    True
    dtype: bool
    

    and True or True gives True, of course:

    In[15]: True or True
    Out[15]: True
    
MarianD
  • 13,096
  • 12
  • 42
  • 54