Test data: enter image description here
finaldata1: Abc, Nck, xkl, mzn
finaldata2: Abk, Nck, cnl, mzn
check col: cnl, Abk, Abc
expected result of "same" check:
finaldata1: Abc
finaldata2: Abk, cnl
I'm try to compare finaldata1 and finaldata2 against the check column. I'm doing this a column at a time, as I do not know how to combine the search using multiple columns. Then, if there are differences, print these as well (not also looking for which ones were different)
attempted code:
import pandas as pd
df = pd.read_excel('testComparisons.xlsx')
#going to check same ones first, and then later on do the differences
result1 = df['finaldata1'].isin(df['check'])
print(result1):
0 True
1 False
2 False
3 False
now, if I want to print out that above result, showing only Abc as the result, the following code doesn't result the correct result- I'm puzzled!
print(df[result1])
finaldata1 finaldata2 check
0 Abc Abk cnl
I thought it should only return the first row, first column, as that was the only one that was true? Instead its returning the entire first row, not limited to the first column.
And then to reverse my code, I thought I would just need to add the tilde sign, ~, to it becomes "is not in".
I tried following How to filter Pandas dataframe using 'in' and 'not in' like in SQL, but that lead me really wrong as all the answers became false which is not correct when I ran the following:
result1 = df[['finaldata1','finaldata2']].isin(df['check'])
finaldata1 finaldata2
0 False False
1 False False
2 False False
3 False False