0

If I want to loop through values in a Series, I can do that using the in operator

[x for x in pd.Series(['Hello, World!'])]

> ['Hello, World!']

but if I use in to check if Hello, World! is in the Series, it returns False.

'Hello, World!' in pd.Series(['Hello, World!'])

> False

Paradoxically (to the untrained eye), this behavior makes the following list comprehension return empty:

hello_series = pd.Series(['Hello, World!'])

[x for x in hello_series if x in hello_series]

> []

This is Series-specific behavior; it of course works fine with lists:

'Hello, World!' in ['Hello, World!']

> True

Why does in work in one context and not the other with Series, and for what reason(s)?

semblable
  • 773
  • 1
  • 8
  • 26
  • 1
    Does this answer your question? [How to determine whether a Pandas Column contains a particular value](https://stackoverflow.com/questions/21319929/how-to-determine-whether-a-pandas-column-contains-a-particular-value) – buran Feb 17 '22 at 21:10
  • 1
    `print('Hello, World!' in pd.Series(['Hello, World!']).values)` – buran Feb 17 '22 at 21:10
  • 1
    Here is a pretty good answer: https://stackoverflow.com/questions/49393053/using-in-operator-with-pandas-series/49393472 – jch Feb 17 '22 at 21:32
  • @jch Thanks, this is right on the practical side of the question. – semblable Feb 18 '22 at 14:47
  • @KristianCanler, actually the very first line of the accepted answer address the **why** question - because **in of a Series checks whether the value is in the index:**. – buran Feb 18 '22 at 15:01
  • Why does it do that is the question I'm asking. I'm looking to understand the behavior at a design level rather than just a literal level. If there's something in SO guidelines about not asking design questions I can just edit that out and close my question as duplicate. – semblable Feb 18 '22 at 15:18

1 Answers1

1

I'm not quite sure if you're asking a practical question or a theoretical one. The theoretical answer is that whoever wrote the Panda code made a specific design decision.

  • Python interprets x in thing by calling y.__contains__(x).

  • Python interprets for x in thing: by creating an iterator for thing and then getting items from that iterator until the iterator throws an exception indicating it has run out of items. A thing can either implement __iter__ to be explicit about its iterator, or Python can sometimes infer one (the thing has both a len(thing) and thing[i]).

The fact that both of these constructs has in in syntax obviously indicates that they're related. But their implementations for a specific object can have nothing to do with each other.

Frank Yellin
  • 9,127
  • 1
  • 12
  • 22
  • The second bullet doesn't quite spell out why `__iter__` doesn't work on the Series, but the answer @jch posted spells that out. So then the "theoretical" question ["for what reasons"] would be why pandas was designed so that pd.Series._info_axis contains information about range/index and not the data itself – semblable Feb 18 '22 at 14:59
  • The second theoretical question is why Python is set up where one (operator?) (`in`) is used to call two different (methods?) instead of having two different operators. That's Python behavior, right, not something specific to pandas? – semblable Feb 18 '22 at 15:01