Why does pandas "None | True" return False when Python "None or True" returns True?

Question

In pure Python, None or True returns True.
However with pandas when I'm doing a | between two Series containing None values, results are not as I expected:

>>> df.to_dict()
{'buybox': {0: None}, 'buybox_y': {0: True}}
>>> df
    buybox  buybox_y
0   None    True

>>> df['buybox'] = (df['buybox'] | df['buybox_y'])
>>> df
    buybox  buybox_y
0   False   True

Expected result:

>>> df
    buybox  buybox_y
0   True    True

I get the result I want by applying the OR operation twice, but I don't get why I should do this.

I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y']) twice in a row) but an explanation, thus the 'why' in the title.

`|` and `or` are two entirely different operators. Note that `None | True` produces a type error. — chepner, Apr 06 '21 at 14:35
@chepner: Yeah, but Pandas uses `|` for logical or, and we're not getting a TypeError. We're getting False somehow. — user2357112, Apr 06 '21 at 14:37
Pandas doc (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing) specifies that `|` is used for logical or and not bitwise or. My pandas version is 1.2.0 — politinsa, Apr 06 '21 at 14:39
"Somehow" would appear to mean that `__or__` is implemented to convert `None` to a `bool` first. `or` isn't really a boolean operator, but it uses boolean *equivalents* to determine which argument to return. — chepner, Apr 06 '21 at 14:53
Additional weirdness: [if you switch the argument order, you get True instead](https://ideone.com/4eJdq3)! — user2357112, Apr 06 '21 at 14:55
Also, this is likely a bug: `None` is interpreted as truthy when evaluating the or `|` and as falsey when converted to boolean. The second part is easy to verify as `df['buybox'].astype(bool)` gets to `False`. — norok2, Apr 06 '21 at 14:55
Huh... experiment actually contradicts the Pandas documentation. The docs say Pandas logical operations on NaN always return False, but `pandas.Series([True]) | pandas.Series([nan])` has a `True` instead of `False` in the result. (Putting the NaN first gives False.) — user2357112, Apr 06 '21 at 14:59
@norok2: If `None` were treated as truthy in the `|`, then we'd get True, not False. — user2357112, Apr 06 '21 at 15:00
@user2357112supportsMonica no, you would get the object, not True. Compare with `1 or True -> 1`. Likely, `|` is short-circuiting and not even caring what is on the other side, as your finding of swapping the order of operands suggests. — norok2, Apr 06 '21 at 15:01
There's a [related issue](https://github.com/pandas-dev/pandas/issues/6528) on the tracker for NaN. It looks like this is just treated as known weirdness. — user2357112, Apr 06 '21 at 15:11
Note that we don't particularly deal in "why"s here. We deal in concrete, practical questions with concrete answers; a "why" doesn't always have a rationale, beyond "that scenario wasn't included during design and failed to be considered". See f/e [What is the rationale for closing "why" questions on language design?](https://meta.stackexchange.com/a/170415/144918) — Charles Duffy, Apr 09 '21 at 14:06
@CharlesDuffy I don't see the question as that type of why. This why is more of a "This code does something else from what I would expect. What am I overlooking? Where is my mistake?" which to me seems like a very common and meaningful type of question on Stack Overflow. And pointing to how the or operators are defined in pandas, or what bug this behaviour is a consequence of (I don't know which is the case), would answer the question. The OP doesn't ask _why_ the operators are defined like that or _why_ there is a bug; only in those cases would it be a why of the type you mention. — Jesper, Apr 09 '21 at 14:43
@Jesper, I generally agree; it's that the comments asserting that there _is_ a bug were ignored / treated as nonresponsive by the OP (and the question had a bounty added with a message refocusing on the interest being an explanation rather than a workaround) that led to the above comment. — Charles Duffy, Apr 09 '21 at 17:41

paiv · Accepted Answer · 2021-04-10T05:28:33.610

Pandas | operator does not rely on Python or expression, and behaves differently.

If both operands are boolean, the result is mathematically defined, and the same for Python and Pandas.

But in your case series "buybox" is of type object, and "buybox_y" is bool. In this case Pandas | operator is not commutative:

right operand is coerced to boolean
then bitwise or is attempted
- None | True is invalid operation, resulting in None
and result is coerced to boolean

Thus,

>>> df['buybox'] | df['buybox_y']
0  False

>>> df['buybox_y'] | df['buybox']
0  True

For predictable results, you can clean up data, and cast to boolean type with Pandas astype before attempting boolean operations.

score -1 · Answer 2 · answered Apr 15 '21 at 05:17

-1

For Boolean objects (ie Py_True and Py_False), the code will enter the fast processing branch; for other objects, PyObject_IsTrue() will be used to calculate a value of type int.

During the calculation process, the PyObject_IsTrue() function will obtain the values of nb_bool, mp_length, and sq_length in turn, which should correspond to the return values of the two magic methods bool() and len().

answered Apr 15 '21 at 05:17

Conjure.Li

38
3

This may well be true and interesting information about how `or` works in CPython , but the issue in this question is entirely different, because it's how the `|` operator between two pandas Series works, which is a completely different implementation and doesn't match either pure Python `or` or `|`. – Tim Apr 20 '21 at 10:17

Why does pandas "None | True" return False when Python "None or True" returns True?

2 Answers2