Context
I have a database made of csv files. I'm getting new files every month and I need to update my database with those files. However I can't overwrite changes about one important data because I need to keep track of the history of it.
Problem
I have 2 data-frames that look like this - the first 3 columns are the key :
Database dataframe (DD)
ID1 ID2 ID3 important_data some_date1 some_data2 date
1 2 3 10 X Y 2019-09-19 14:53:16.107278
4 5 6 10 M N 2019-07-15 14:53:16.107278
Database client (DC)
ID1 ID2 ID3 important_data some_date1 some_data2 date
1 2 3 15 A B 2019-10-19 14:53:16.107278
4 5 6 10 O P 2019-09-18 14:53:16.107278
The first DF is the one in my DB. The second one is the new one given by my client.
If the "important_data" has not been changed (case of the id 4 5 6) compared to the last matching tuple in date, I overwrite the old data with the new one :
New Database dataframe (DD)
ID1 ID2 ID3 important_data some_date1 some_data2 date
1 2 3 10 X Y 2019-09-19 14:53:16.107278
4 5 6 10 O P 2019-09-18 14:53:16.107278
Else (ID 1 2 3 - or if the raw is completely new) I need to write a whole new raw in the database frame :
New Database dataframe (DD)
ID1 ID2 ID3 important_data some_date1 some_data2 date
1 2 3 10 X Y 2019-09-19 14:53:16.107278
1 2 3 15 A B 2019-10-19 14:53:16.107278
4 5 6 10 M N 2019-07-15 14:53:16.107278
Question
I can do it with a regular nested ifs like this (algorithm) :
Select last_in_date rows in DD (for each IDs) in new dataframe DD_pending // we only need the last entries
For rowC in DC
new = true
For rowD in DD_pending
If matching IDs then
new = false
If same "important_data" then overwrite_everything
Else create_new_row in DD
if new then create_new_row in DD
Merge DD_pending with DD, drop duplicates
But is it possible to simplify it, in terms of resources needed by the machine, using pandas merging ? I tried to look at this but it's not quite the same. I'm looking at concat also but I can't find a way to do it.
Thanks !
Ps : I tried to be as clear as possible, with examples, but if it's not clear enough feel free to ask for clarifications !