I have a Django model that records transactions. I need to update only some of the fields (two) of some of the transactions.
In order to update, the user is asked to provide additional data and I use pandas to make calculations using this extra data.
I use the output from the pandas script to update the original model like this:
for i in df.tnsx_uuid:
t = Transactions.objects.get(tnsx_uuid=i)
t.start_bal = df.loc[df.tnsx_uuid==i].start_bal.values[0]
t.end_bal = df.loc[df.tnsx_uuid==i].end_bal.values[0]
t.save()
this is very slow. What is the best way to do this?
UPDATE: after some more research, I found bulk_update and changed the code to:
transactions = Transactions.objects.select_for_update()\
.filter(tnsx_uuid__in=list(df.tnsx_uuid)).only('start_bal', 'end_bal')
for t in transactions:
i = t.tnsx_uuid
t.start_bal = df.loc[df.tnsx_uuid==i].start_bal.values[0]
t.end_bal = df.loc[df.tnsx_uuid==i].end_bal.values[0]
Transactions.objects.bulk_update(transactions, ['start_bal', 'end_bal'])
this has approximately halved the time required.
How can I improve performance further?