How to delete row from one dataframe which exist in another dataframe

Question

I have two dataframes. If in df1 I have a person with name and birth date like in df2, i want to delete all rows with this name and birth date from df1. How i can do it using pandas?

df1=

Full name	Birth date	param1	param2	param10
Name 1	date1	something	something	something
Name 2	date2	something	something	something
Name 3	date3	something	something	something
Name 4	date4	something	something	something

df2=

Full name	Birth date	param11	param12	param20
Name 1	date1	something	something	something
Name 2	date2	something	something	something

score 2 · Answer 1 · answered Aug 01 '23 at 18:27

2

Another possible solution:

d1 = df1.set_index(['Full name', 'Birth date'])
d2 = df2.set_index(['Full name', 'Birth date'])

d1.loc[d1.index.difference(d2.index)].reset_index()

Output:

  Full name Birth date     param1     param2    param10
0    Name 3      date3  something  something  something
1    Name 4      date4  something  something  something

answered Aug 01 '23 at 18:27

PaulS

21,159
2
9
26

1

it's work fine. Thanks! – Сергей Корягин Aug 01 '23 at 20:49

eshirvana · Answer 2 · 2023-08-01T20:27:44.503

1

here is one way :

merged = df1.merge(df2, on=['Full name', 'Birth date'], how='outer', indicator=True)

# Filter out rows that are present in both df1 and df2
out = merged[merged['_merge'] == 'left_only'][df1.columns]

output:

  Full name Birth date     param1     param2    param10
2    Name 3      date3  something  something  something
3    Name 4      date4  something  something  something

edited Aug 01 '23 at 20:27

answered Aug 01 '23 at 17:01

eshirvana

23,227
3
22
38

Might be simpler to do `df1.merge(df2[['Full name', 'Birth date']], how='left', indicator=True)` then at the end, `.drop(columns='_match')`. – wjandrea Aug 01 '23 at 18:00
With a full outer join, the last step also filters out rows that are only in `df2`, though there aren't any in this example. – wjandrea Aug 01 '23 at 18:02

score 0 · Answer 3 · edited Aug 01 '23 at 17:27

Short Answer

Note: If my understanding of your question is correct, you should be able to do this using the snippet of python pandas code below.

merged_df = pd.merge(df1, df2[['Full name', 'Birth date']], on=['Full name', 'Birth date'], how='left', indicator=True)
df1 = merged_df[merged_df['_merge'] == 'left_only'].drop(columns=['_merge'])

where df1 only contains the rows which didn't have entries in df2

Additional details

The operation you're trying to perform is called an anti-join. In your case, you're trying to remove rows from df1 where there's a matching name and birthdate in df2.

First, you'd want to ensure that 'Full name' and 'Birth date' are of the same data type in both dataframes. This is necessary to ensure the merge operation works correctly.
- If 'Birth date' is a string in both dataframes, there's no problem. But if it's a datetime type, you need to ensure both are in the same format.
To perform the anti-join, you can merge df1 and df2 on 'Full name' and 'Birth date' using a left join, and then keep only the rows where 'Full name' and 'Birth date' from df2 are null.
- The indicator=True argument adds a column to the output DataFrame called _merge with information on the source of each row. The values are 'left_only', 'right_only', or 'both' depending on the source of the data. Rows with a '_merge' value of 'left_only' are those that were in df1 but not in df2, which is what you want.
After this operation, df1 will only contain rows that are not in df2.

*"If 'Birth date' is a string in both dataframes, there's no problem. But if it's a datetime type, you need to ensure both are in the same format."* -- I think you have those backwards, cause datetimes don't have format. — wjandrea, Aug 01 '23 at 17:28

wjandrea · Answer 4 · 2023-08-01T17:50:15.763

You could do a left join with indicator (see Pandas Merging 101) then select the rows that came from the left.

keys = ['Full name', 'Birth date']
left_only = (
    df1[keys].merge(df2[keys].drop_duplicates(), how='left', indicator=True)
    ['_merge'].eq('left_only')
    )
df1[left_only]

  Full name Birth date     param1     param2    param10
2    Name 3      date3  something  something  something
3    Name 4      date4  something  something  something

The .drop_duplicates() isn't necessary in this example, but I added it just in case.

Chris · Answer 5 · 2023-08-01T17:47:21.067

0

As pointed out, if you have duplicates in your original DF this will drop them too - but since it's not indicated in your question I'll leave it up.

You could concat just the columns you want to check for dupes on and drop on the same set of columns:

out =  (pd.concat([df1,
                   df2[['Full name', 'Birth date']]])
          .drop_duplicates(subset=['Full name','Birth date'],
                           keep=False)
    )

edited Aug 01 '23 at 17:47

answered Aug 01 '23 at 17:38

Chris

15,819
3
24
37

This looks like, if `df1` has duplicates that aren't in `df2`, it'll delete them. Is that right? – wjandrea Aug 01 '23 at 17:45
Good point, I'll update the answer – Chris Aug 01 '23 at 17:46

How to delete row from one dataframe which exist in another dataframe

5 Answers5

Short Answer

Additional details