2

I'm trying to get the complement of two dataframes based on a column attribute.

pd.merge can take the intersection, but is there a simple way to take the compliment without creating/shuffling more dataframes?

Someone else answered this question (how to find the complement of two dataframes) here with a method of creating/shuffling more dataframes, but I'm wondering if there's a more straight forward way to do this with a function.

QSTY
  • 43
  • 3

2 Answers2

2

You can use MultiIndexing and eliminate common index and append:

df1.set_index(['key1', 'key2'], inplace=True)
df2.set_index(['key1', 'key2'], inplace=True)
df = df1[~df1.index.isin(df2.index)].append(df2[~df2.index.isin(df1.index)], sort=False)\
                                    .reset_index()

print(df)
  key1 key2    A    B    C    D
0   K0   K1   A1   B1  NaN  NaN
1   K2   K1   A3   B3  NaN  NaN
2   K2   K0  NaN  NaN   C3   D3
Space Impact
  • 13,085
  • 23
  • 48
2

You could do an outer merge on the two dataframes, setting indicator to be True. Then subset the data based on the _merge column:

combined = df1.merge(df2, on='col1', how='outer', indicator=True)
combined[combined._merge != 'both']

Example:

print(df1)
#    col1  col2
# 0     1     1
# 1     2     2
# 2     3     3
# 3     4     4
# 4     5     5
# 5     6     6

print(df2)
#    col1  col3
# 0     6     1
# 1     7     2
# 2     8     3
# 3     9     4
# 4    10     5
# 5    11     6

print(combined)
#     col1  col2  col3      _merge
# 0      1   1.0   NaN   left_only
# 1      2   2.0   NaN   left_only
# 2      3   3.0   NaN   left_only
# 3      4   4.0   NaN   left_only
# 4      5   5.0   NaN   left_only
# 6      7   NaN   2.0  right_only
# 7      8   NaN   3.0  right_only
# 8      9   NaN   4.0  right_only
# 9     10   NaN   5.0  right_only
# 10    11   NaN   6.0  right_only
Joe Patten
  • 1,664
  • 1
  • 9
  • 15