If I use the standard Python boolean operators and
/or
/not
, one nice feature is that they treat None
the way I would logically expect. That is, not only
True and True == True
True and False == False
but also
True and None == None
False and None == False
True or None == True
False or None == None
This follows the logic that, for instance, if A is False and B is unknown, (A and B) must still be False, while (A or B) is unknown.
I needed to perform boolean operations on Pandas DataFrames with missing data, and was hoping I'd be able to use the same logic. For boolean logic on numpy arrays and Pandas series, we need to use bitwise operators &
/|
/~
. Pandas seems to have behaviour that is partially the same as and
/or
/not
, but partially different. In short, it seems to return False
when the value should logically be unknown.
For example:
a = pd.Series([True,False,True,False])
b = pd.Series([True,True,None,None])
Then we get
> a & b
0 True
1 False
2 False
3 False
dtype: bool
and
> a | b
0 True
1 True
2 True
3 False
I would expect that the output of a & b
should be a Series [True,False,None,False]
and that the output of a | b
should be a Series [True,True,True,None]
. The actual result matches what I'd expect except returns False
instead of any missing values.
Finally, ~b
just gives a TypeError:
TypeError: bad operand type for unary ~: 'NoneType'
which seems odd since &
and |
at least partially work.
Is there a better way to carry out boolean logic in this situation? Is this a bug in Pandas?
Analogous tests with numpy arrays just give type errors, so I assume Pandas is handling the logic itself here.