Short Answer
Note: If my understanding of your question is correct, you should be able to do this using the snippet of python pandas code below.
merged_df = pd.merge(df1, df2[['Full name', 'Birth date']], on=['Full name', 'Birth date'], how='left', indicator=True)
df1 = merged_df[merged_df['_merge'] == 'left_only'].drop(columns=['_merge'])
where df1
only contains the rows which didn't have entries in df2
Additional details
The operation you're trying to perform is called an anti-join. In your case, you're trying to remove rows from df1
where there's a matching name and birthdate in df2
.
- First, you'd want to ensure that 'Full name' and 'Birth date' are of the same data type in both dataframes. This is necessary to ensure the merge operation works correctly.
- If 'Birth date' is a string in both dataframes, there's no problem. But if it's a datetime type, you need to ensure both are in the same format.
- To perform the anti-join, you can merge
df1
and df2
on 'Full name' and 'Birth date' using a left join, and then keep only the rows where 'Full name' and 'Birth date' from df2
are null.
- The
indicator=True
argument adds a column to the output DataFrame called _merge
with information on the source of each row. The values are 'left_only', 'right_only', or 'both' depending on the source of the data. Rows with a '_merge' value of 'left_only' are those that were in df1
but not in df2
, which is what you want.
- After this operation,
df1
will only contain rows that are not in df2
.