Create pandas df from difference between two dfs

Question

Set-up

I have two pandas data frames df1 and df2, each containing two columns with observations for id and its respective url,

| id | url |          | id | url | 
------------          ------------
| 1  | url |          | 2  | url |
| 2  | url |          | 4  | url |
| 3  | url |          | 3  | url |
| 4  | url |          | 5  | url |
                      | 6  | url |

Some observations are in both dfs, which is clear from the id column, e.g. observation 2 and it's url are in both dfs.

The positioning within the dfs of those 'double' observations does not necessarily have to be the same, e.g. observation 2 is in first row in df1 and second in df2.

Lastly, the dfs do not necessarily have the same number of observations, e.g. df1 has four observations while df2 has five.

Problem

I want to elicit all unique observations in df2 and insert them in a new df (df3), i.e. I want to obtain,

| id | url |
------------
| 5  | url |
| 6  | url |

How do I go about?

I've tried this answer but cannot get it to work for my two-column dataframes.

I've also tried this other answer, but this gives me an empty common dataframe.

Are you after this: https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe? if so then it's a dupe — EdChum, Jul 12 '17 at 08:30
Please edit your post with your attempts from that question, stating that it doesn't work is not informative. I believe it would work but you need to prove it doesn't, also the answer from Greg should work, if it does then it's a dupe, if it doesn't then demonstrate this — EdChum, Jul 12 '17 at 08:59

score 1 · Accepted Answer · answered Jul 12 '17 at 08:34

1

Possibly something like this: df3 = df2[~df2.id.isin(df1.id.tolist())]

answered Jul 12 '17 at 08:34

Greg

101
1
10

score 1 · Answer 2 · answered Jul 12 '17 at 17:12

1

ID numbers make good index names:

df1.index = df1.id
df2.index = df2.id

Then use the very straightforward index.difference:

diff_index = df2.index.difference(df1.index)
df3 = df2.loc[diff_index]

answered Jul 12 '17 at 17:12

Create pandas df from difference between two dfs

2 Answers2