0

Most data structures utilize Python's in operator to either return keys or values. Pandas Series mixes these. My question is why are they mixed? Is there a functional purpose to this?

To be clear, I'm not asking how the mechanics work; that has been asked multiple times like here and here. I am asking why it was implemented in this (arguably counterintuitive) way.

I say counterintuitive because it results in behavior like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.ones((3,3)),columns=['a','b','c'])
df.replace(1,'abc',inplace=True)

a = df['a']
print([x in a for x in a])

# This will print
# [False, False, False]

Every other data structure I can think of (including Pandas DataFrames) will return a list of True rather than False since membership (x in a) and iteration (for x in a) operate on the same item, either keys (like dictionaries) or values (like lists).

TLDR: The Series object iterates over values and checks membership in keys. What's the reasoning behind this implementation?


P.S. For anyone landing here wondering how to iterate over the keys, simply use the `.values()' method on the Series object. It's most efficient.

ThatNewGuy
  • 197
  • 11
  • 1
    I'm not sure SO is the right platform for this. This is an implementation detail that can be best answered by the pandas dev team. – wkgrcdsam Apr 09 '21 at 13:51
  • @wkgrcdsam True, this is less of a usage question and more a general inquiry. Per their [Getting Help](https://github.com/pandas-dev/pandas#getting-help), I posted to the [PyData mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata). – ThatNewGuy Apr 09 '21 at 15:49

0 Answers0