0

I am looking for a way to update multiple objects more efficiently than calling .save() on each of them.

I have code, which uses .filter() to load objects. Then it communicates with an external service, in order to decide what need to be done with each object. Finally I have a list of objects with updated values, which I need to save.

This code works, but is too slow:

for o in l: o.save()

Searching for suggestions I have so far found .bulk_create() and .update()

bulk_create works great for new objects. But replacing above loop with model.objects.bulk_create(l) produces IntegrityError: UNIQUE constraint failed:, probably because it is trying to create new objects rather than updating existing objects.

Using .update() does not appear to be applicable to my use case either, because it will update all objects in the set with the same value. In my case I have computed a different value for each of the objects, that I need to save.

Is there a faster solution than calling .save() on each object?

kasperd
  • 1,952
  • 1
  • 20
  • 31

3 Answers3

0

You could try setting a has_changed flag in an object when it is modified, then only saving the objects where has_changed is True.

This is under the assumption that not all the objects are always modified though. If all the objects always change, this is not the solution.

mdw7326
  • 434
  • 7
  • 28
  • I don't call `save` on all the objects I retrieved. I run through all the objects once doing some calculations in memory. Those objects, which are modified by this calculation gets appended to a new list. The list I want to save in the end contains only objects, which have been modified. – kasperd Sep 19 '14 at 19:46
  • ahhh okay, well I'll keep pondering it in the back of my mind. – mdw7326 Sep 19 '14 at 19:55
0

Django can't do this, because (in general) your database can't do this. There's no single SQL statement to update different rows with different values. (Special exceptions like using CASE...WHEN aren't likely to help here.)

If many or most of the objects were new, you might track which and create those with bulk_create; otherwise, iterating and doing save() is probably your best bet.

You might consider parallelizing the procedure using threads, especially if there's significant latency communicating with the external service.

tcarobruce
  • 3,773
  • 21
  • 33
  • It might not have to be a single SQL statement. Sending multiple SQL statements back to back and deferring handling of replies, could be much faster than handling them completely sequential. – kasperd Sep 19 '14 at 20:59
  • Any reason django couldn't have a generic `bulk_save` method based on `CASE WHEN` like described here: http://stackoverflow.com/a/20255203/3476849 – kasperd Sep 19 '14 at 21:21
0

I found this workaround:

class CASE(object):
    def __init__(self, field_name, objects):
        self.field_name = field_name
        self.objects = objects

    def __unicode__(self): return self

    def as_sql(self, qn, connection):
        sql = [ 'CASE id' ]
        params = []
        for o in self.objects:
            sql.append('WHEN %s THEN %s')
            params.append(o.pk)
            params.append(getattr(o, self.field_name))
        sql.append('ELSE')
        sql.append(qn(self.field_name))
        sql.append('END')
        return (' '.join(sql), params)

model.objects.update(value=CASE('value', l))

It is not pretty, but as far as I can tell, it does produce correct and efficient SQL as long as I am just updating one text field.

kasperd
  • 1,952
  • 1
  • 20
  • 31
  • I feel I shouldn't accept my own answer because it is so narrow, that even though it may be a suitable workaround for my particular case, it doesn't answer the general version of the question as I stated it. In order to achieve that, it would at the very least have to automatically apply to all applicable fields. Also my workaround is a bit of a hack. And I'd rather rely on a build-in feature than constructing my own `as_sql` method. – kasperd Sep 20 '14 at 10:39
  • A cleaner implementation of the above would use `django.db.models.expressions` instead of my very hacky approach. – kasperd Sep 20 '14 at 12:24