0

I have a question on how to speed up my code.

From a api request I retrieve a huge table that I convert into a list of dictionaries. I do this every X minutes to keep up to date.

Now I want to upload this to firebase firestore to use in my web app. To limit the amount of reads and writes to firebase I only want to send the changes in the table.

I use this function to calculate the differences (added, changed and deleted).

def getDifferences(table, tableInfo):
    tableName = tableInfo[0]
    tableNumber = tableInfo[1]
    ID = tableInfo[2]
    tableOld = []

    if os.path.isfile('/home/bellboy/Python/history/' + tableName + '.txt'):
        with open('/home/bellboy/Python/history/' + tableName + '.txt', 'rb') as f:
            tableOld = pickle.load(f)

    with open('/home/bellboy/Python/history/' + tableName + '.txt', 'wb') as f:
        pickle.dump(table, f)

    i = {}
    added = [i for i in table if i not in tableOld and not any(
        d[ID] == i[ID] for d in tableOld)]
    changed = [i for i in table if i not in tableOld and any(
        d[ID] == i[ID] for d in tableOld)]
    deleted = [i for i in tableOld if i not in table and not any(
        d[ID] == i[ID] for d in table)]

    return [added, changed, deleted]

How can I speed up this function, the lists can be 40,000 dictionaries long?

The ID checking is necessary because there can be changes within the dictionaries values.

Sample data:

[
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204829'},
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204830'},
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204831'},
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204832'},
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204833'},
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204834'},
    {'volgnummer': u'001', 'naam': u'xxxxxx', 'telefoonnummer': u'0311-xxxxxxx', 'emailadres': u'xxxxxxx@xxxxxx.nl', 'klantnummer': u'204835'},
 ]
  • 2
    I think if you used pandas dataframe for this you might be able to get a speedup. You can use various types of pd.merge() to find the deleted, and added entries, and I think there should be a way to look for changes too. I can't really come up with a test without some example data. – bart cubrich Mar 27 '19 at 17:58
  • Agreed - a dict is the wrong data structure for this problem, and if you can make a table out of it the comparison would go much quicker. `pandas` is good at this kind of thing. – Green Cloak Guy Mar 27 '19 at 18:10
  • [Here](https://stackoverflow.com/questions/32815640/how-to-get-the-difference-between-two-dictionaries-in-python) is a list of answers regarding comparing dicts, you might get a speedup using `set` operations, if you decide to keep it in a dict structure – G. Anderson Mar 27 '19 at 18:10
  • @bartcubrich I added some sample data so you can see the structure. – Rémon van Nieuwenhuizen Mar 27 '19 at 18:12

0 Answers0