I have 2 pandas data frames which have multiple columns.
Some rows have same values in all columns except one column which is updated_at
.
I need to merge 2 data frames and consider the latest updated_at
value from matched rows. updated_at
is a datetime value.
I found a way to merge data frames but not sure how to use the latest value for updated_at
column.
pd.merge(
df1,
df2,
how="inner",
on=["column_1", "column_2", "column_3"])
Here's sample input for df1
and df2
.
df1:
MRN Encounter ID First Name Last Name Birth Date updated_at
1 1234 John Doe 01/02/1999 04/12/2002 6:00 PM
2 2345 Joanne Lee 04/19/2002 04/19/2002 7:22 PM
3 3456 Annabelle Jones 01/02/2001 04/21/2002 5:00 PM
df2:
MRN Encounter ID First Name Last Name Birth Date updated_at
1 1234 John Doe 01/02/1999 04/12/2002 5:00 PM
2 2345 Joanne Lee 04/19/2002 04/19/2002 8:22 PM
final_output:
MRN Encounter ID First Name Last Name Birth Date updated_at
1 1234 John Doe 01/02/1999 04/12/2002 6:00 PM
2 2345 Joanne Lee 04/19/2002 04/19/2002 8:22 PM
3 3456 Annabelle Jones 01/02/2001 04/21/2002 5:00 PM
Notice the updated_at
column is having latest value from matched records.