It seems there are quite a few questions and answers related to the speed of nested for loops - I think I looked at every single one of them! But unfortunately I am still not exactly sure why my code is slow. I'm hoping I can get some guidance from you fine people.
I download a csv file daily that has ~116,000 entries. Items are added and taken away from it at inconsistent points in the file, so every day I want to see what was added and what was taken away.
Getting the entries from csv to a list takes no time at all, for both the old and new list, but I encounter a big speed decrease in the next part of the code, although at the end, it does what I want and spits out the difference - items added and items removed.
Each of the 116,000 items in the list is a dictionary like so:
old or new = [{'Date Stamped': '', 'Name': '', 'Registration Number': '', 'Type': '', "Form Name': '', 'URL': "}]
when I get to this point:
added = [i for i in new if not i in old]
removed = [i for i in old if not i in new]
It takes 25 minutes to finish! I feel like this is a long time, but also I may not be understanding exactly what I'm doing.
Each list (old and new) has ~116000 items in it. Is that because i has to iterate through ~116,000 items 4 times?
It does what I want, in the end, but it seems awfully slow for what it's doing; that said, this is really the first time I've worked with a data set with this many items, so maybe it's par for course.
Is this slow because it is a nested for loop? Is it slow because of the size? I am definitely an amateur and really appreciate everyone's help. Thanks so much.