How to add row to a current dataset only if at least one value has changed or if the ID doesn't exist yet?

Question

I'm trying to build a history of my data where i add rows to my current dataset only if at least one column value has changed rather than adding every row everyday then clean the data by removing duplicates.

The code I wrote looks through every row of the new and current dataset and focuses on rows where IDs match. Then when ID matches, scans through every column (each dataset has the same columns) and looks for a difference. If there is one, the row is added to the current set. My current code adds too many rows and I can't figure out what to change to fix this. I also don't know how to add the new IDs that are not in the current dataset yet

Current dataset :

ID	Age	City	Last update
1	13	London	01/02/2022
2	19	Bath	01/01/2022

New dataset :

ID	Age	City
1	13	London
2	19	Brighton
3	15	Leeds

Current dataset after code update :

ID	Age	City	Last update
1	13	London	01/02/2022
2	19	Bath	01/01/2022
2	19	Brighton	22/02/2022
3	15	Leeds	22/02/2022

I add today's date to every row from the new dataset before running this loop, but when checking columns, i do not check it (as it is always different from last update).

Current code :

for i in range (len(current)):
    count=0
    for j in range (len(new)):
        #ID match
        if current.iloc[i,0] == new.iloc[j,0]:
            #checking every column for discrepancy
            for c in range (0,9):
                if (current.iloc[i,c] != new.iloc[j,c]) & (current.iloc[i,c] ==  current.iloc[i,c]) & (new.iloc[j,c] == new.iloc[j,c]) : #A==A used to ignore nan
                    count+=1             
    if count != 0:
        current = current.append(new.iloc[j,:])

From what I understand, the code works but if I have say ID 1 in two different rows already (because column 3 is different in first and second row for example), the code will work for both row and therefore add my new row twice. How can I fix this ? Also, how do I deal with IDs that are not in the current dataset yet ?

Thanks !

You question is not fully clear, check if the duplicate works for you and if not please provide a counter-example. — mozway, Feb 22 '23 at 14:29

How to add row to a current dataset only if at least one value has changed or if the ID doesn't exist yet?

0 Answers0