Why does "in" work for a pandas Series in a list comphrension and not as a logical expression

Question

If I want to loop through values in a Series, I can do that using the in operator

[x for x in pd.Series(['Hello, World!'])]

> ['Hello, World!']

but if I use in to check if Hello, World! is in the Series, it returns False.

'Hello, World!' in pd.Series(['Hello, World!'])

> False

Paradoxically (to the untrained eye), this behavior makes the following list comprehension return empty:

hello_series = pd.Series(['Hello, World!'])

[x for x in hello_series if x in hello_series]

> []

This is Series-specific behavior; it of course works fine with lists:

'Hello, World!' in ['Hello, World!']

> True

Why does in work in one context and not the other with Series, and for what reason(s)?

Does this answer your question? [How to determine whether a Pandas Column contains a particular value](https://stackoverflow.com/questions/21319929/how-to-determine-whether-a-pandas-column-contains-a-particular-value) — buran, Feb 17 '22 at 21:10
`print('Hello, World!' in pd.Series(['Hello, World!']).values)` — buran, Feb 17 '22 at 21:10
Here is a pretty good answer: https://stackoverflow.com/questions/49393053/using-in-operator-with-pandas-series/49393472 — jch, Feb 17 '22 at 21:32
@jch Thanks, this is right on the practical side of the question. — semblable, Feb 18 '22 at 14:47
@KristianCanler, actually the very first line of the accepted answer address the **why** question - because **in of a Series checks whether the value is in the index:**. — buran, Feb 18 '22 at 15:01
Why does it do that is the question I'm asking. I'm looking to understand the behavior at a design level rather than just a literal level. If there's something in SO guidelines about not asking design questions I can just edit that out and close my question as duplicate. — semblable, Feb 18 '22 at 15:18

score 1 · Answer 1 · answered Feb 17 '22 at 21:58

1

I'm not quite sure if you're asking a practical question or a theoretical one. The theoretical answer is that whoever wrote the Panda code made a specific design decision.

Python interprets x in thing by calling y.__contains__(x).
Python interprets for x in thing: by creating an iterator for thing and then getting items from that iterator until the iterator throws an exception indicating it has run out of items. A thing can either implement __iter__ to be explicit about its iterator, or Python can sometimes infer one (the thing has both a len(thing) and thing[i]).

The fact that both of these constructs has in in syntax obviously indicates that they're related. But their implementations for a specific object can have nothing to do with each other.

answered Feb 17 '22 at 21:58

Frank Yellin

9,127
1
12
22

The second bullet doesn't quite spell out why `__iter__` doesn't work on the Series, but the answer @jch posted spells that out. So then the "theoretical" question ["for what reasons"] would be why pandas was designed so that pd.Series._info_axis contains information about range/index and not the data itself – semblable Feb 18 '22 at 14:59
The second theoretical question is why Python is set up where one (operator?) (`in`) is used to call two different (methods?) instead of having two different operators. That's Python behavior, right, not something specific to pandas? – semblable Feb 18 '22 at 15:01

Why does "in" work for a pandas Series in a list comphrension and not as a logical expression

1 Answers1