I am looking to join 2 dataframes. Left is a historical repository, right is current data along with possible updates to what is listed in historical.
I want to join based on matching 2 columns (Call them requestor
and date
.)
So, I want to keep all items in the repository and if there is a match with new data based on the 2 columns I want to update the remainder of that row with the new data.
Essentially this is an outer join that keys on two columns and overwrites with new data if there is a match.
Any ideas on the correct approach to do this?
Edit:
history = pd.concat([history,new])
mostrecentupdate = history.groupby(['Requestor','RequestDate'])['LastUpdateDate'].transform(max)
history = history[history['LastUpdateDate'] == mostrecentupdate]