0

I am looking to join 2 dataframes. Left is a historical repository, right is current data along with possible updates to what is listed in historical.

I want to join based on matching 2 columns (Call them requestor and date.)

So, I want to keep all items in the repository and if there is a match with new data based on the 2 columns I want to update the remainder of that row with the new data.

Essentially this is an outer join that keys on two columns and overwrites with new data if there is a match.

Any ideas on the correct approach to do this?

Edit:

history = pd.concat([history,new])                                                                    
mostrecentupdate = history.groupby(['Requestor','RequestDate'])['LastUpdateDate'].transform(max)         
history = history[history['LastUpdateDate'] == mostrecentupdate]               
cs95
  • 379,657
  • 97
  • 704
  • 746
pacificdune
  • 95
  • 1
  • 10

1 Answers1

0

Step 1 marge You are looking for concat Here you can see an example: https://pandas.pydata.org/pandas-docs/stable/merging.html Step2 filter Similar to here how do you filter pandas dataframes by multiple columns

ChaosPredictor
  • 3,777
  • 1
  • 36
  • 46
  • 1
    Thank you for your insight. After some research I believe I found a solution that will work I have edited my question. – pacificdune Sep 27 '17 at 23:16