2

I discovered the hard way that Pandas in operator, applied to Series operates on indices and not on the actual data:

In [1]: import pandas as pd

In [2]: x = pd.Series([1, 2, 3])

In [3]: x.index = [10, 20, 30]

In [4]: x
Out[4]:
10    1
20    2
30    3
dtype: int64

In [5]: 1 in x
Out[5]: False


In [6]: 10 in x
Out[6]: True

My intuition is that x series contains the number 1 and not the index 10, which is apparently wrong. What is the reason behind this behavior? Are the following approaches the best possible alternatives?

In [7]: 1 in set(x)
Out[7]: True

In [8]: 1 in list(x)
Out[8]: True

In [9]: 1 in x.values
Out[9]: True

UPDATE

I did some timings on my suggestions. It looks like x.values is the best way:

In [21]: x = pd.Series(np.random.randint(0, 100000, 1000))

In [22]: x.index = np.arange(900000, 900000 + 1000)

In [23]: x.tail()
Out[23]:
900995    88999
900996    13151
900997    25928
900998    36149
900999    97983
dtype: int64

In [24]: %timeit 36149 in set(x)
10000 loops, best of 3: 190 µs per loop

In [25]: %timeit 36149 in list(x)
1000 loops, best of 3: 638 µs per loop

In [26]: %timeit 36149 in (x.values)
100000 loops, best of 3: 6.86 µs per loop
Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170

1 Answers1

3

It is may be helpful to think of the pandas.Series as being a bit like a dictionary, where the index values are equivalent to the keys. Compare:

>>> d = {'a': 1}
>>> 1 in d
False
>>> 'a' in d
True

with:

>>> s = pandas.Series([1], index=['a'])
>>> 1 in s
False
>>> 'a' in s
True

However, note that iterating over the series iterates over the data, not the index, so list(s) would give [1], not ['a'].

Indeed, per the documentation, the index values "must be unique and hashable", so I'd guess there's a hashtable under there somewhere.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • Strange, both my experience and [another pandas doc page][http://pandas.pydata.org/pandas-docs/dev/dsintro.html] say that Pandas support non-unique indices – Boris Gorelik Jul 20 '14 at 05:41
  • It looks like that particular piece of documentation is outdated. I posted a bug report. – Boris Gorelik Jul 20 '14 at 06:18
  • @bgbg which piece; the one I linked or the one you linked? – jonrsharpe Jul 20 '14 at 08:45
  • Thinking of `index` values as `keys` is spot on. But then why wouldn't membership _also_ check keys? That's how every other data type operates. Series is the only one where `x in S for x in S` is going to be `False`. – ThatNewGuy Apr 09 '21 at 12:38