Most data structures utilize Python's in
operator to either return keys
or values
. Pandas Series mixes these. My question is why are they mixed? Is there a functional purpose to this?
To be clear, I'm not asking how the mechanics work; that has been asked multiple times like here and here. I am asking why it was implemented in this (arguably counterintuitive) way.
I say counterintuitive because it results in behavior like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.ones((3,3)),columns=['a','b','c'])
df.replace(1,'abc',inplace=True)
a = df['a']
print([x in a for x in a])
# This will print
# [False, False, False]
Every other data structure I can think of (including Pandas DataFrames) will return a list of True
rather than False
since membership (x in a
) and iteration (for x in a
) operate on the same item, either keys
(like dictionaries) or values
(like lists).
TLDR: The Series object iterates over values
and checks membership in keys
. What's the reasoning behind this implementation?
P.S. For anyone landing here wondering how to iterate over the keys, simply use the `.values()' method on the Series object. It's most efficient.