0

I have three data-frames defined df1,df2 and df3. I need a simple way to find the unmatched records between df1, df2 and df3. Its opposite of the intersection of these data frames. If I compare it with SQL world it can be achieved by left outer joins between these three tables and filter on null columns to get unmatched records. How this can be easily done in Python?

2 Answers2

0

If you want to join by a specific columns of the Dataframes you can do something like that:

indexes = df1[(df1[TARGET_COLUMN]!=df2[TARGET_COLUMN]) | (df1[TARGET_COLUMN]!=df3[TARGET_COLUMN])].index

Where the TARGET_COLUMN should be the name of the column.

If you want to compare between entire rows you can use:

def compare_indexes(x):
    index = x.name
    for column_name in df1.columns.values:
        if not (x[column_name] == df2.iloc[index,:][column_name] == df3.iloc[index,:][column_name]):
            return False
    return True


df1["match"] = df1.apply(compare_indexes, axis=1)
indexes = df1[df1["match"]==False].index
AndreyF
  • 1,798
  • 1
  • 14
  • 25
  • Thanks for alternate approach – awesomeprashant Feb 16 '17 at 13:06
  • Hope it helped. If performance is an issue, notice this approach is linear in terms of run time and iterates only twice over the dataframes (once for creating match column and once again for selecting indexes). – AndreyF Feb 16 '17 at 13:41
0
merged_df = pd.merge(df1, df2, how='outer', on='[your index]').merge(df3, how='outer', on='[your index]')

result = merged_df.loc[~merged_df.index.isin(merged_df.dropna().index)]

the result is what your want.

duke yu
  • 89
  • 4