how to find the complement of two dataframes

Question

given two large dataframes, is there any concise and efficient code (avoid using any for loop directly) that allow me to obtain the complement of these two dataframes?

the most straight forward way to me is to compute union-intersection as shown in the naive example below, but I do not know how to implement this in an elegant languages of pandas or np

df1= pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                   'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})     
df2= pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})        
intersection= pd.merge(df1, df2, how='inner',on=['key1', 'key2'])
union=pd.merge(df1, df2, how='outer',on=['key1', 'key2'])       


complement=union-intersection

thanks for any comments and answers

Possible duplicate of [picking out elements based on complement of indices in Python pandas](http://stackoverflow.com/questions/14986510/picking-out-elements-based-on-complement-of-indices-in-python-pandas) — Mathias711, Aug 11 '16 at 21:38
@Mathias711, thanks. To apply their solution, I need to find the index of the `intersection` in the `union`. then I can apply their method to exclude previously obtained index — user6651227, Aug 11 '16 at 22:03

score 24 · Accepted Answer · answered Aug 11 '16 at 22:02

Starting with this:

df1= pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                   'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})     
df2= pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})        
intersection  = pd.merge(df1, df2, how='inner',on=['key1', 'key2'])
union         = pd.merge(df1, df2, how='outer',on=['key1', 'key2'])

print union

     A    B key1 key2    C    D
0   A0   B0   K0   K0   C0   D0
1   A1   B1   K0   K1  NaN  NaN
2   A2   B2   K1   K0   C1   D1
3   A2   B2   K1   K0   C2   D2
4   A3   B3   K2   K1  NaN  NaN
5  NaN  NaN   K2   K0   C3   D3

print intersection

    A   B key1 key2   C   D
0  A0  B0   K0   K0  C0  D0
1  A2  B2   K1   K0  C1  D1
2  A2  B2   K1   K0  C2  D2

union-intersection try this:

union[union.isnull().any(axis=1)]

     A    B key1 key2    C    D
1   A1   B1   K0   K1  NaN  NaN
4   A3   B3   K2   K1  NaN  NaN
5  NaN  NaN   K2   K0   C3   D3

thanks a lot, and I do not have enough reputation the upvote you though — user6651227, Aug 12 '16 at 13:10

how to find the complement of two dataframes

1 Answers1

Linked