Comparing two dataframe columns to see if they have the same values

Question

I have two dataframe columns that looks something like this:

   col1    col2 
0   A2      B8
1   B8      C3
2   D2      A2

I want to identify all values in col1 which are not in col2 and visa versa. In the above example, the output would be:

C3, D2

Both lists are not unique.

Possible duplicate of [Compare two columns using pandas](https://stackoverflow.com/questions/27474921/compare-two-columns-using-pandas) — Avión, Jul 19 '17 at 14:43

asongtoruin · Answer 1 · 2017-07-19T14:55:39.087

Why not convert each column to a set and calculate the symmetric difference?

import pandas as pd

df = pd.DataFrame({'col1': ['A2', 'B8', 'D2'],
                   'col2': ['B8', 'C3', 'A2']})

print set(df['col1']).symmetric_difference(set(df['col2']))

This prints:

set(['C3', 'D2'])

EDIT:

If you want to track which elements came from where, you could adapt this and create a new dictionary, like so:

col1 = set(df['col1'])
col2 = set(df['col2'])

exclusive_items = {'col1': [x for x in col1 if x not in col2],
                   'col2': [x for x in col2 if x not in col1]}

print exclusive_items

Where each key in exclusive_items contains as its values the entries unique to that column. This prints:

{'col2': ['C3'], 'col1': ['D2']}

In fact, as we have sets we could simplify this to:

exclusive_items = {'col1': col1-col2,
                   'col2': col2-col1}

With this method, is there a way of knowing which was in col1 and not in col2 and which was in col2 and not in col1? — Cahn Ex, Jul 19 '17 at 14:48
@CahnEx not strictly speaking, but I've added a modified version to demonstrate how this could be possible — asongtoruin, Jul 19 '17 at 14:55

score 0 · Answer 2 · answered Jul 19 '17 at 14:54

import pandas as pd

df = pd.DataFrame({'col1': ['A2', 'B8', 'D2'],
                   'col2': ['B8', 'C3', 'A2']})


in_col1_not_in_col2 = list(set(df['col1'])-(set(df['col2'])))
in_col2_not_in_col1 = list(set(df['col2'])-(set(df['col1'])))

print('in_col1_not_in_col2: ')
print(in_col1_not_in_col2)
print('in_col2_not_in_col1: ')
print(in_col2_not_in_col1)

in_col1_not_in_col2: ['D2']

in_col2_not_in_col1: ['C3']

Comparing two dataframe columns to see if they have the same values

2 Answers2