0

I have two sample dfs as below:

df1
    Name    DOB
0   AMY     20100101
1   AMANDA  19990213
2   LEO     19920103
3   RIO     20200109
4   JEFF    20050314

df2
    Name    DOB
0   AMY     20100101
1   LEO     19920103
2   SEAN    19971123
3   BEN     20170119
4   SAM     20020615
5   YI      19930202
6   RICHAEE 19980919
7   MICHAEL 19920229

I want to compare the two dfs and the expected results look like:

Name      DOB
AMANDA    19990213
RIO       20200109
JEFF      20050314

I tried to use left join but didn't get what I expected

df1=pd.DataFrame({'Name':['AMY','AMANDA','LEO','RIO','JEFF'],
                  'DOB':['20100101','19990213','19920103','20200109','20050314']})

df2=pd.DataFrame({'Name':['AMY','LEO','SEAN','BEN','SAM','YI','RICHAEEL','MICHAEL'],
                  'DOB':['20100101','19920103','19971123','20170119','20020615','19930202','19980919','19920229']})

pd.merge(df1, df2, on='Name', how='left')

Can anyone help me? Thanks!

Bangbangbang
  • 560
  • 2
  • 12
  • Hi that's interesting, I wonder if this might be of interest https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe – IronMan Apr 17 '20 at 21:44
  • 1
    This is also known as an "anti-join" https://stackoverflow.com/questions/38516664/anti-join-pandas `pd.merge(df1, df2[['Name']], on='Name', how='left', indicator=True).loc[lambda df: df['_merge'] == 'left_only'].drop(columns='_merge')` – mcskinner Apr 17 '20 at 21:45
  • 1
    Use `df1[~df1['Name'].isin(df2['Name'])]` – ansev Apr 17 '20 at 21:46
  • @mcskinner thank you. Glad to learn the anti-join! Can you please post the solution to the answer and I can accept it? Thanks! – Bangbangbang Apr 17 '20 at 21:49
  • or `pd.merge(df1, df2[['Name']], on='Name', how='left', indicator=True).loc[lambda x: x['_query']=='left_only'].drop(columns='_merge')` – ansev Apr 17 '20 at 21:50

1 Answers1

0

This is also known as an "anti-join".

(pd.merge(df1, df2[['Name']], on='Name', how='left', indicator=True)
    .loc[lambda df: df['_merge'] == 'left_only']
    .drop(columns='_merge'))
#      Name       DOB
# 1  AMANDA  19990213
# 3     RIO  20200109
# 4    JEFF  20050314
mcskinner
  • 2,620
  • 1
  • 11
  • 21
  • this is duplicated od pandas merging 101, also don't copy solution in comments – ansev Apr 17 '20 at 21:51
  • @ansev OP asked for my comment to be translated to an answer for them. I see you have posted the same solution later. I think the valuable part of the answer was the vocabulary help to find "anti join", otherwise this is not easy to find. – mcskinner Apr 17 '20 at 21:53
  • Actually if you take a look, my comment predates both of yours by some minutes. Believe it or not, I came up with my solution independently. – mcskinner Apr 17 '20 at 23:55
  • also in this particular case it is much faster to use series.isin due to the simplicity of the dataframe. a boolean indexing with series.isin is much faster than using a complex method such as merge in addition to loc and drop – ansev Apr 18 '20 at 12:00