0

I have a dataset with some employee information, and I would like to see if certain records appear in another DataFrame. However, there may be duplicate IDs (I know...), so I wanted to filter where the ID AND date of birth are the same. I tried doing it with a merge, but then all the columns get added, which I don't want. How should I go about this?

Example data:

df1 = pd.DataFrame({"ID": [1, 2, 3, 4, 5], "DOB": ["1987-12-03", "1993-04-05", "2000-01-24", "1995-05-18", "1974-10-10"], "JOB": [6, 7, 8, 9, 10]})
df2 = pd.DataFrame({"ID": [1, 1, 2, 3, 3, 4, 4], "DOB": ["1987-12-03", "1999-06-16", "1993-04-05", "2000-01-24", "1968-11-13", "1995-05-18", "1988-12-12"], "JOB": [6, 11, 7, 8, 12, 9, 13]})

I want the output to be:

   ID         DOB  JOB
0   1  1987-12-03    6
1   2  1993-04-05    7
2   3  2000-01-24    8
3   4  1995-05-18    9

Since those are the values that are in both DataFrames based on ID AND DOB.

MKJ
  • 120
  • 8

1 Answers1

1

You can drop the additional columns, and then do an inner merge:

out = df1.merge(df2.drop('JOB', axis=1), on=['ID', 'DOB'], how='inner')
oskros
  • 3,101
  • 2
  • 9
  • 28