0

I have a code, that should work under concurrent request and heavy load.

I wrote an example to give a better understanding of that I'm trying to do:

def add_tag():
    with transaction.atomic():
        image = Image.objects.get(pk=2)
        tag = Tag.objects.get(pk=6)

        image.tags.add(tag) # concurrent insert

    return 'done'


class Command(BaseCommand):
    def handle(self, *args, **options):
        with ProcessPoolExecutor(max_workers=3) as executor:
            futures = []
            for _ in range(3):
                futures.append(executor.submit(add_tag))

            for future in as_completed(futures):
                print(future.result())

And here is my models:

class Image(models.Model):
    title = models.CharField(max_length=255)
    tags = models.ManyToManyField('ov_tags.Tag')

class Tag(models.Model):
    title = models.CharField(max_length=255)

I'm trying to insert in ManyToMany-relation table in parallel. Obviously, this causes an error, because of READ COMMITED isolation level:

django.db.utils.IntegrityError: duplicate key value violates unique constraint

Absolutely fine, but how to remove this error completely?

To protect my image, I tried to use select_for_update on Image select.

image = Image.objects.select_for_update().get(pk=2)

And... It works! I run it several times. There are no errors anymore and item inserted correctly. But I don't know why?

Is select_for_update locking relational table anyhow? Or is it happening on an application side? Is there a right way to achieve such behavior?

Can I use empty select to lock for insert?

SELECT "image_tags"."tag_id" FROM "image_tags" WHERE ("image_tags"."tag_id" IN (6) AND "image_tags"."image_id" = 2) FOR UPDATE
asnelzin
  • 174
  • 2
  • 11

2 Answers2

5

On a database level, you're only locking the specific Image instance that you're adding tags to. You're correct that this does not prevent inserts into the relational table. If another piece of code ignores the lock and simply inserts a new row in the relation table, you can still run into trouble.

It works for this piece of code because each transaction is "well-behaved". Each transactions first acquires a lock on the specific image, before it adds new entries to the relational table. This means that each process in the executor pool will wait for the current process to finish its transaction before it attempts to add new rows in the relational table.

This would also work if you'd lock the Tag instead of the Image, but it doesn't work if some code locks the Tag, while other code locks the Image. At that point, one process can acquire the lock on the Image, but the other process doesn't wait because it can still acquire the lock on the Tag, and both processes try to insert the same row into the relational table at the same time.

That's what I mean with "well-behaved": each part of your application must behave in a specific way (acquire the same lock). If just one part of your application ignores this requirement, you can run into race conditions. Only if all parts of your application behave well, will you be able to prevent race conditions in this way.

knbk
  • 52,111
  • 9
  • 124
  • 122
  • I think you've described the situation (and its possible problem) very well. But I would add 2 things: 1) even with this well-behaved transactions, this kind of locking is unnecessarily strict: these `select_for_update` calls wait for `UPDATE`s on these `Image` rows too (and the other way around). 2) What OP want is usually called `UPSERT` & [PostgreSQL supports it](http://stackoverflow.com/questions/17267417/how-to-upsert-merge-insert-on-duplicate-update-in-postgresql) (however I'm not sure if it's available in django at all). – pozs Apr 20 '17 at 13:55
-1

This is exactly what is happening, the select_for_update call is locking the Image table at the database level, so that no other transaction will be able to modify the selected rows until the end of the transaction.atomic block.

See for reference https://docs.djangoproject.com/en/1.11/ref/models/querysets/#select-for-update

  • 1
    I'm pretty sure this is not a table level lock. ``If FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE or FOR KEY SHARE is specified, the SELECT statement locks the selected rows against concurrent updates.`` https://www.postgresql.org/docs/9.5/static/sql-select.html – asnelzin Apr 20 '17 at 08:53
  • And I'm not modifying selected Image table rows. – asnelzin Apr 20 '17 at 08:54
  • That's correct, it's not locking the entire table, but only the relevant rows. – Stefano Cianciulli Apr 20 '17 at 08:56