0

I'm checking to see if all the values in one column of one dataframe lies in the column of another dataframe. When I run the code below, it says that 4 does not exist in df1. Is there any particular reason for this?

list1=[1,2,3,4]

list2=[1,2,3,4]

df2=pd.DataFrame(list2)

df2.rename(columns={0:"List2"},inplace=True)

df1=pd.DataFrame(list1)

df1.rename(columns={0:"List1"},inplace=True)


for i in df2['List2']: 
    if i not in df1['List1']:
        print(i)
dyl
  • 1
  • Use `pd.Series.isin`, df2['List2'].isin(df1['List1']) all true. – Scott Boston Sep 07 '21 at 19:44
  • 2
    The `in` operator when applied to a Series checks the _index_ not the values. df1's indexes are `[0, 1, 2, 3]` (no 4). This is indeed confusing since iter does iterate over the values. But this is why the 4 is "not in" `df1['List1']` – Henry Ecker Sep 07 '21 at 19:45
  • Yes, I tried that too. Which is why Im so confused when the above returns 4 – dyl Sep 07 '21 at 19:47
  • Use `set`: `not bool(set(df2['List2']).symmetric_difference(df1['List1']))` – Corralien Sep 07 '21 at 19:47
  • Thats what I thought! Is there anyway to edit to make sure it checks the actual entries in the dataframe? – dyl Sep 07 '21 at 19:49
  • All of the options in [this answer](https://stackoverflow.com/a/21320011/15497888) work. – Henry Ecker Sep 07 '21 at 19:49
  • Great! Thanks for this. I'll try them out now – dyl Sep 07 '21 at 19:51

0 Answers0