Why are items in a pandas column, not in the pandas column that they're in?

Question

If I have a dataframe of random values:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10, size=[5, 5]))

and I choose any column and run the following code:

for x in df[0]:
    print(x in df[0])

I would expect the output to be:

True
True
True
True
True

but it isn't. It prints an assortment of "True" and "False". It seems to be printing "True" for items in range(5) and False otherwise. I tried changing my dataframe to:

df = pd.DataFrame(np.random.randint(0,10, size=[6, 6]))

and the same code prints True for elements in range(6). If I change the condition to:

for x in df[0]:
    print(x in list(df[0]))

It prints all True as expected (regardless of the size of the dataframe). Can anyone explain why this is?

It is checking that the item is in the index. There is a good explanation here: https://stackoverflow.com/questions/21319929/how-to-determine-whether-a-pandas-column-contains-a-particular-value — Barrendeitor, Oct 24 '20 at 08:30

score 1 · Accepted Answer · answered Oct 24 '20 at 09:07

df[0] is a pandas series object <class 'pandas.core.series.Series'>. Now, x in df[0] which checks for item existence in series, checks existence of x in the index.

example:

df = pd.DataFrame(np.arange(25).reshape(5,5))
#    0   1   2   3   4
#0   0   1   2   3   4
#1   5   6   7   8   9
#2  10  11  12  13  14
#3  15  16  17  18  19
#4  20  21  22  23  24

print(10 in df[0])
#False

However, list(df[0]) returns a list of pandas series values (similarly, df[0].values, df[0].to_numpy(), set(df[0]) are all values) and x in list(df[0]) checks existence of x in the values.

print(10 in list(df[0]))
#True
print(10 in set(df[0]))
#True
print(10 in df[0].values)
#True
print(10 in df[0].to_numpy())
#True

Why are items in a pandas column, not in the pandas column that they're in?

1 Answers1