3

There might be former questions but they do not show up when you search for them. This question and its title is very informative.

import pandas as pd

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz'], 'x1': [1, 2, 3], 'x2': [11, 22, 33]})
                    
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'bazz'], 'y1': [1, 2, 3], 'y2': [111, 222, 333]})

merged_df = pd.merge(df1, df2, how= 'inner', left_on=['lkey','x1'], right_on=['rkey','y1'])

merged_df

How to show which rows of df1 are not merged?

Listing unmerged rows of df1 with columns 'lkey' and 'x1' will be very helpful.

 baz   3
burcak
  • 1,009
  • 10
  • 34
  • `indicator=True` see the [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) – Trenton McKinney Oct 17 '20 at 22:05
  • 1
    @TrentonMcKinney indicator=True only adds one more column as _merged but does not show the rows of df1 that are not merged. – burcak Oct 17 '20 at 22:07
  • `dfm = df1.merge(df2, how='outer', left_on=['lkey','x1'], right_on=['rkey','y1'], indicator=True)` then `dfm[dfm._merge == 'right_only']`, as in the 1st duplicate. – Trenton McKinney Oct 17 '20 at 22:13

1 Answers1

7

You can check if the df .isin the merged_df dataframe and then pass as a boolean series with ~ to return the rows that were not merged:

import pandas as pd
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz'], 'x1': [1, 2, 3], 'x2': [11, 22, 33]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'bazz'], 'y1': [1, 2, 3], 'y2': [111, 222, 333]})
merged_df = pd.merge(df1, df2, how= 'inner', left_on=['lkey','x1'], right_on=['rkey','y1'])
df1[~df1.isin(merged_df)].dropna()
Out[1]: 
  lkey   x1    x2
2  baz  3.0  33.0

You could also do an outer merge and check with indicator=True per Trenton's comment or simply dropna on a subset of lkey with:

merged_df = pd.merge(df1, df2, how= 'outer', left_on=['lkey','x1'], right_on=['rkey','y1']).dropna(subset=['lkey'])
David Erickson
  • 16,433
  • 2
  • 19
  • 35