I have 2 dataframes and I need to get only the rows where the name value(GN) is unique(not in the other dataframe). This is what I have come up with so far but I would like to know if there are any improvements to be made.
import pandas as pd
sdf = pd.read_excel(r"C:\Users\fnafee\Desktop\DiffExp\Results\smokersfilteredreal.xlsx")
nsdf = pd.read_excel(r"C:\Users\fnafee\Desktop\DiffExp\Results\nonsmokersfilteredreal.xlsx")
list_of_S = sdf['GN'].tolist()
list_of_NS = nsdf['GN'].tolist()
refined_list_of_S = [x for x in list_of_S if pd.isnull(x) == False]
refined_list_of_NS = [x for x in list_of_NS if pd.isnull(x) == False]
unique_NS = nsdf[~nsdf['GN'].isin(refined_list_of_S)]
unique_S = sdf[~sdf['GN'].isin(refined_list_of_NS)]
I went ahead and turned the name columns into lists and only took rows where the 'GN' wasn't in the opposite dataframes list of names. I am aware of the symmetricdifference() function but I don't believe it shows me which list the value is unique to