1

I have two dataframes: df1 and df2. I want to eliminate all occurrences of df2 rows in df1. Basically, this is the set difference operator but for dataframes.

My ask is very similar to this question with one major variation that its possible that df1 may have no common rows at all. In that case, if we concat the two dataframes and then drop the duplicates, it still doesn't eliminate df2 occurrences in df1. Infact it adds to it.

The question is also similar to this, except that I want my operation on the rows.

Example:

Case 1:
df1:
A,B,C,D
E,F,G,H

df2:
E,F,G,H

Then, df1-df2:
A,B,C,D

Case 2:
df1:
A,B,C,D

df2:
E,F,G,H

Then, df1 - df2:
A,B,C,D

Spoken simply, I am looking for a way to do df1 - df2 (remove df2 if present in df1). How should this be done?

user248884
  • 851
  • 1
  • 11
  • 21
  • @OP can you please confirm whether these are series or DataFrames? If they are series, then isin will work. Otherwise, it will not. – cs95 Jan 25 '19 at 20:02
  • @coldspeed, the OP states that the operation is required on rows so definitely a dataframe – Vaishali Jan 25 '19 at 20:04
  • Can you also confirm that the logic is `the set difference`. Would `df2: F,E,G,H` lead to the same result? – ALollz Jan 25 '19 at 20:04

2 Answers2

5

Set difference will work here, it returns unique values in ar1 that are not in ar2.

np.setdiff1d(df1, df2)

Or to get the result in form of DataFrame,

pd.DataFrame([np.setdiff1d(df1, df2)])
Vaishali
  • 37,545
  • 5
  • 58
  • 86
4

try:

df1[~df1.isin(df2)]

A,B,C,D
anky
  • 74,114
  • 11
  • 41
  • 70
  • ^ Both rows are returned, not 'A,B,C,D'. Can you check this please? What is your input? – cs95 Jan 25 '19 at 19:48
  • @coldspeed you are correct, if index is as given in example, merge is better. :) I took same index into consideration – anky Jan 25 '19 at 19:58
  • 1
    If these are series, then it makes sense to just use isin. But OP has mentioned they have two DataFrames. So I am curious as to why so many are misunderstanding the question (even to go so far as to downvote my answer which is perfectly valid). – cs95 Jan 25 '19 at 20:00