I ran into a surprising result when looking for an integer id number in a pandas column of integers where I knew the number was in the list. I've now boiled this down to a really simple test case that baffles me. I'm clearly missing something really obvious?!
Here is how I reproduced the problem:
import numpy as np
import pandas as pd
# Create two pandas objects; col_2 is an np.int64
source_series_1 = pd.DataFrame({'col_1': ['a','b','c','d'], 'col_2':np.int64([1, 2, 3, 4])})
source_series_2 = pd.DataFrame({'col_1': ['a','b','c','d'], 'col_2':np.int64([101, 102, 103, 104])})
Now test membership in the these dfs:
# Test membership in pandas series
print(np.int64(2) in source_series_1.col_2)
print(np.int64(102) in source_series_2.col_2)
output:
True
False # ?!
# But! convert to a simple list...
print(np.int64(2) in list(source_series_1.col_2))
print(np.int64(102) in list(source_series_2.col_2))
output:
True
True
I note I get the same output for both without the explicit cast:
print(2 in source_series_1.col_2) #True
print(102 in source_series_2.col_2) #False
There is clearly something incredibly simple going on that I am just missing/forgetting. I'd love to understand why source_series_2 fails the 'in' test?