0

Test data: enter image description here

finaldata1: Abc, Nck, xkl, mzn

finaldata2: Abk, Nck, cnl, mzn

check col: cnl, Abk, Abc

expected result of "same" check:

finaldata1: Abc

finaldata2: Abk, cnl

I'm try to compare finaldata1 and finaldata2 against the check column. I'm doing this a column at a time, as I do not know how to combine the search using multiple columns. Then, if there are differences, print these as well (not also looking for which ones were different)

attempted code:

import pandas as pd


df = pd.read_excel('testComparisons.xlsx')


#going to check same ones first, and then later on do the differences


result1 = df['finaldata1'].isin(df['check'])

print(result1):

0     True
1    False
2    False
3    False

now, if I want to print out that above result, showing only Abc as the result, the following code doesn't result the correct result- I'm puzzled!

print(df[result1])


  finaldata1 finaldata2 check


0        Abc        Abk   cnl

I thought it should only return the first row, first column, as that was the only one that was true? Instead its returning the entire first row, not limited to the first column.

And then to reverse my code, I thought I would just need to add the tilde sign, ~, to it becomes "is not in".

I tried following How to filter Pandas dataframe using 'in' and 'not in' like in SQL, but that lead me really wrong as all the answers became false which is not correct when I ran the following:

result1 = df[['finaldata1','finaldata2']].isin(df['check'])


   finaldata1  finaldata2
0       False       False
1       False       False
2       False       False
3       False       False

1 Answers1

0

I am not sure, if I completly understood what you expect, but why don't you use sets for it?

finaldata1 = ['Abc', 'Nck', 'xkl', 'mzn']
finaldata2 = ['Abk', 'Nck', 'cnl', 'mzn']
    
check = ['cnl', 'Abk', 'Abc']
        
same_finaldata1 = set(finaldata1) & set(check)
same_finaldata2 = set(finaldata2) & set(check)

print (same_finaldata1, same_finaldata2)
am2
  • 11
  • 3