0

After doing a left merge of two dataframes with Pandas, I would like to produce a data frame consisting of the second/right dataframe's rows that were not successfully merged.

The dataframes, df1 and df2, are being merged based on two columns, entitled City and State. I did this to produce the merged dataframe, df3:

merged_df = pd.merge(df1, df2, how="left", left_on=['City','State'], right_on=['City','State'])

How do I produce an unmerged_df consisting of the df2 rows that were not successfully matched and merged with df1?

Thanks for your help!

  • I think you can filter df1 and df2 with the merged_df. Remove from df1 and df2 what is present in the merged – Ana Lívia Nov 03 '20 at 15:14

1 Answers1

0

You want the right excluding join.

Check this answer.

In your case it should be as simple as this (untested):

right_outer_df = (df1.merge(df2, on=['City','State'], how='right', indicator=True)
     .query('_merge == "right_only"')
     .drop('_merge', 1))
CAPSLOCK
  • 6,243
  • 3
  • 33
  • 56