0

I have 2 dataframes and I need to get only the rows where the name value(GN) is unique(not in the other dataframe). This is what I have come up with so far but I would like to know if there are any improvements to be made.

import pandas as pd
sdf = pd.read_excel(r"C:\Users\fnafee\Desktop\DiffExp\Results\smokersfilteredreal.xlsx")
nsdf = pd.read_excel(r"C:\Users\fnafee\Desktop\DiffExp\Results\nonsmokersfilteredreal.xlsx")
list_of_S = sdf['GN'].tolist()
list_of_NS = nsdf['GN'].tolist()
refined_list_of_S = [x for x in list_of_S if pd.isnull(x) == False]
refined_list_of_NS = [x for x in list_of_NS if pd.isnull(x) == False]
unique_NS = nsdf[~nsdf['GN'].isin(refined_list_of_S)]
unique_S = sdf[~sdf['GN'].isin(refined_list_of_NS)]

I went ahead and turned the name columns into lists and only took rows where the 'GN' wasn't in the opposite dataframes list of names. I am aware of the symmetricdifference() function but I don't believe it shows me which list the value is unique to

  • 4
    Convert the lists to sets. Then you can use set subtraction. `set1 - set2` is all the items in `set1` that aren't shared. `set2 - set1` is all the items in `set2` that aren't shared. – Barmar Jun 28 '22 at 21:11
  • 2
    Does this answer your question? [Compute list difference](https://stackoverflow.com/questions/6486450/compute-list-difference) – BeRT2me Jun 28 '22 at 21:19

0 Answers0