Ok so for a problem I must parse in two files of information and then compare those files. I need to report any inconsistent data and any data that is in one file but not the other file. To do this I have sorted both lists of data which allows me to compare the first element in each list, if they are equal I remove them if they are inconsistent I report and remove them, if one or the other is different I report which one is missing data and then PUT BACK the later date so it can be compared next time.
As you can see in my code (at least I think, and have tested extensively) that this method works well for data sets with 100-200 lines per file. When I get larger, like 1,000-1,000,000 it takes to long to report.
I am stumped at how my while loop would be causing this. See below. The split represents the date(split[0]) and then a piece of information(split[1]).
Any help would be appreciated and this is actually my first python program.
tldr; For some reason my program works fine in small data sets but larger data sets do not run properly. It isn't the sort()'s either (i.e. something in my first while loop is causing the run time to be shit).
ws1.sort()
ws2.sort()
while ws1 and ws2:
currItem1 = ws1.pop(0)
currItem2 = ws2.pop(0)
if currItem1 == currItem2:
continue
splitWS1 = currItem1.split()
splitWS2 = currItem2.split()
if splitWS1[0] == splitWS2[0] and splitWS1[1] != splitWS2[1]:
print("Inconsistent Data (" + splitWS1[0] + "): A: " + splitWS1[1] + " B: " + splitWS2[1])
continue
elif splitWS1[0] < splitWS2[0]:
print("Missing Data (" + splitWS1[0] + ") in data set A but not in B")
ws2.insert(0, currItem2)
continue
elif splitWS1[0] > splitWS2[0]:
print("Missing Data (" + splitWS2[0] + ") in data set B but not in A")
ws1.insert(0, currItem1)
continue
while ws2:
currItem2 = ws2.pop(0)
splitWS2 = currItem2.split()
print("Missing data (" + splitWS2[0] + ") in data set B but not in A")
while ws1:
currItem1 = ws1.pop(0)
splitWS1 = currItem1.split()
print("Missing data (" + splitWS1[0] + ") in data set A but not in B")