1

If I have a dataframe of random values:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10, size=[5, 5]))

and I choose any column and run the following code:

for x in df[0]:
    print(x in df[0])

I would expect the output to be:

True
True
True
True
True

but it isn't. It prints an assortment of "True" and "False". It seems to be printing "True" for items in range(5) and False otherwise. I tried changing my dataframe to:

df = pd.DataFrame(np.random.randint(0,10, size=[6, 6]))

and the same code prints True for elements in range(6). If I change the condition to:

for x in df[0]:
    print(x in list(df[0]))

It prints all True as expected (regardless of the size of the dataframe). Can anyone explain why this is?

Linden
  • 531
  • 3
  • 12
  • 2
    It is checking that the item is in the index. There is a good explanation here: https://stackoverflow.com/questions/21319929/how-to-determine-whether-a-pandas-column-contains-a-particular-value – Barrendeitor Oct 24 '20 at 08:30

1 Answers1

1

df[0] is a pandas series object <class 'pandas.core.series.Series'>. Now, x in df[0] which checks for item existence in series, checks existence of x in the index.

example:

df = pd.DataFrame(np.arange(25).reshape(5,5))
#    0   1   2   3   4
#0   0   1   2   3   4
#1   5   6   7   8   9
#2  10  11  12  13  14
#3  15  16  17  18  19
#4  20  21  22  23  24

print(10 in df[0])
#False

However, list(df[0]) returns a list of pandas series values (similarly, df[0].values, df[0].to_numpy(), set(df[0]) are all values) and x in list(df[0]) checks existence of x in the values.

print(10 in list(df[0]))
#True
print(10 in set(df[0]))
#True
print(10 in df[0].values)
#True
print(10 in df[0].to_numpy())
#True
Ehsan
  • 12,072
  • 2
  • 20
  • 33