0

I'm trying to build a history of my data where i add rows to my current dataset only if at least one column value has changed rather than adding every row everyday then clean the data by removing duplicates.

The code I wrote looks through every row of the new and current dataset and focuses on rows where IDs match. Then when ID matches, scans through every column (each dataset has the same columns) and looks for a difference. If there is one, the row is added to the current set. My current code adds too many rows and I can't figure out what to change to fix this. I also don't know how to add the new IDs that are not in the current dataset yet

Current dataset :

ID Age City Last update
1 13 London 01/02/2022
2 19 Bath 01/01/2022

New dataset :

ID Age City
1 13 London
2 19 Brighton
3 15 Leeds

Current dataset after code update :

ID Age City Last update
1 13 London 01/02/2022
2 19 Bath 01/01/2022
2 19 Brighton 22/02/2022
3 15 Leeds 22/02/2022

I add today's date to every row from the new dataset before running this loop, but when checking columns, i do not check it (as it is always different from last update).

Current code :

for i in range (len(current)):
    count=0
    for j in range (len(new)):
        #ID match
        if current.iloc[i,0] == new.iloc[j,0]:
            #checking every column for discrepancy
            for c in range (0,9):
                if (current.iloc[i,c] != new.iloc[j,c]) & (current.iloc[i,c] ==  current.iloc[i,c]) & (new.iloc[j,c] == new.iloc[j,c]) : #A==A used to ignore nan
                    count+=1             
    if count != 0:
        current = current.append(new.iloc[j,:])

From what I understand, the code works but if I have say ID 1 in two different rows already (because column 3 is different in first and second row for example), the code will work for both row and therefore add my new row twice. How can I fix this ? Also, how do I deal with IDs that are not in the current dataset yet ?

Thanks !

Lausaurine
  • 11
  • 2
  • You question is not fully clear, check if the duplicate works for you and if not please provide a counter-example. – mozway Feb 22 '23 at 14:29

0 Answers0