I have three data-frames defined df1,df2 and df3. I need a simple way to find the unmatched records between df1, df2 and df3. Its opposite of the intersection of these data frames. If I compare it with SQL world it can be achieved by left outer joins between these three tables and filter on null columns to get unmatched records. How this can be easily done in Python?
Asked
Active
Viewed 1,652 times
0
-
related: http://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe – EdChum Feb 14 '17 at 11:37
-
Make a set and calculate the difference should do the trick. – Dschoni Feb 14 '17 at 11:37
2 Answers
0
If you want to join by a specific columns of the Dataframes you can do something like that:
indexes = df1[(df1[TARGET_COLUMN]!=df2[TARGET_COLUMN]) | (df1[TARGET_COLUMN]!=df3[TARGET_COLUMN])].index
Where the TARGET_COLUMN should be the name of the column.
If you want to compare between entire rows you can use:
def compare_indexes(x):
index = x.name
for column_name in df1.columns.values:
if not (x[column_name] == df2.iloc[index,:][column_name] == df3.iloc[index,:][column_name]):
return False
return True
df1["match"] = df1.apply(compare_indexes, axis=1)
indexes = df1[df1["match"]==False].index

AndreyF
- 1,798
- 1
- 14
- 25
-
-
Hope it helped. If performance is an issue, notice this approach is linear in terms of run time and iterates only twice over the dataframes (once for creating match column and once again for selecting indexes). – AndreyF Feb 16 '17 at 13:41
0
merged_df = pd.merge(df1, df2, how='outer', on='[your index]').merge(df3, how='outer', on='[your index]')
result = merged_df.loc[~merged_df.index.isin(merged_df.dropna().index)]
the result is what your want.

duke yu
- 89
- 4