0

I am trying to understand why 5 in df['ids'] in code below return True since number 5 doesn't exist in the pandas dataframe.

In [82]: df = pd.DataFrame()

In [83]: df['ids'] = list(range(5)) + list(range(6,10))

In [84]: df
Out[84]: 
   ids
0    0
1    1
2    2
3    3
4    4
5    6
6    7
7    8
8    9

In [85]: 5 in df['ids']
Out[85]: True

In [86]: df[df['ids'] == 5]
Out[86]: 
Empty DataFrame
Columns: [ids]
Index: []

In [87]: 5 in list(df['ids'])
Out[87]: False

A.Razavi
  • 479
  • 2
  • 8
  • 19

1 Answers1

1

The trick here is to understand that pandas objects care about the .index a lot. Because of the .index, Series can support a near-dictionary like behavior. From this perspective, just like checking in on a dictionary to check for key existence, it makes some sense that in on a pandas object checks for index existence.

So using the above, we can check:

>>> import pandas as pd
>>> s = pd.Series([1,2,3,4,6,7,8,9])
>>> s.index
RangeIndex(start=0, stop=8, step=1)

>>> 5 in s # implicitly check if 5 is in s.index
True
>>> 5 in s.index # explicitly check if 5 is in s.index
True

>>> 5 in s.values # explicitly check if 5 is in values
False

An alternative way to check if a value is in a Series you can use some boolean logic:

>>> (5 == s).any()
False

Also see This Answer as well.

Cameron Riddell
  • 10,942
  • 9
  • 19