The implicit index-matching of pandas
for operations between different DataFrame
/Series
is great and most of the times, it just works.
However, I've stumbled on an example that does not work as expected:
import pandas as pd # 0.21.0
import numpy as np # 1.13.3
x = pd.Series([True, False, True, True], index = range(4))
y = pd.Series([False, True, True, False], index = [2,4,3,5])
# logical AND: this works, symmetric as it should be
pd.concat([x, y, x & y, y & x], keys = ['x', 'y', 'x&y', 'y&x'], axis = 1)
# x y x&y y&x
# 0 True NaN False False
# 1 False NaN False False
# 2 True False False False
# 3 True True True True
# 4 NaN True False False
# 5 NaN False False False
# but logical OR is not symmetric anymore (same for XOR: x^y vs. y^x)
pd.concat([x, y, x | y, y | x], keys = ['x', 'y', 'x|y', 'y|x'], axis = 1)
# x y x|y y|x
# 0 True NaN True False <-- INCONSISTENT!
# 1 False NaN False False
# 2 True False True True
# 3 True True True True
# 4 NaN True False True <-- INCONSISTENT!
# 5 NaN False False False
Researching a bit, I found two points that seem relevant:
bool(np.nan)
equalsTrue
, cf. https://stackoverflow.com/a/15686477/2965879|
is resolved tonp.bitwise_or
, rather thannp.logical_or
, cf. https://stackoverflow.com/a/37132854/2965879
But ultimately, the kicker seems to be that pandas does casting from nan
to False
at some point. Looking at the above, it appears that this happens after calling np.bitwise_or
, while I think this should happen before?
In particular, using np.logical_or
does not help because it misses the index alignment that pandas
does, and also, I don't want np.nan or False
to equal True
. (In other words, the answer https://stackoverflow.com/a/37132854/2965879 does not help.)
I think that if this wonderful syntactic sugar is provided, it should be as consistent as possible*, and so |
should be symmetric. It's really hard to debug (as happened to me) when something that's always symmetric suddenly isn't anymore.
So finally, the question: Is there any feasible workaround (e.g. overloading something) to salvage x|y == y|x
, and ideally in such a way that (loosely speaking) nan | True == True == True | nan
and nan | False == False == False | nan
?
*even if De Morgan's law falls apart regardless - ~(x&y)
can not fully match ~y|~x
because the NaNs only come in at the index alignment (and so are not affected by a previous negation).