8

I've been reading up on PostgreSQL transaction isolation and how that relates to Django's transaction.atomic() (e.g. this article, PostgreSQL docs), but I'm far from fluent in this topic and I'm not sure I understand what I've read.

We've got a PostgreSQL-backed Django app that involves quota objects. Simplified, it's just this:

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()

An instance of this controls how many times a certain operation can be performed against the obj instance. count is initialized to a certain number, and will only ever decrement until it hits zero.

Any number of processes/threads can concurrently perform these operations. Basically, we need to atomically decrement (with UPDATE) the count of a single database row without deadlocking and without two processes/threads ever e.g. starting with a count of 100 and both trying to decrement it to 99.

My naive approach would be this:

with transaction.atomic():
    cursor = connection.cursor()
    cursor.execute('SET TRANSACTION ISOLATION LEVEL SERIALIZABLE')
    Quota.objects.filter(obj=instance).update(count=F('count')-1)

However, I'm not sure if this is subject to this issue, from the linked article:

if at COMMIT the database cannot determine that the transaction could have been performed serially with respect to the read/writes of other transactions, then it will fail with a django.db.DatabaseError. This can happen even if they updated different rows.

All the processes/threads performing operations against the same obj would be decrementing the same column of the same row, so... maybe? I don't actually know what's involved in PostgreSQL "determin[ing] that the transaction could have been performed serially".

An alternate approach could be:

with transaction.atomic():
    Quota.objects.select_for_update().filter(obj=instance).update(count=F('count')-1)

This seems to do row-level locking, and my understanding is that the isolation level change isn't needed, but I don't know if this is sufficient for correct handling of concurrent operations.

Is one of these approaches preferrable here, and are some modifications still necessary to guarantee atomicity and deadlock avoidance? We could use something like python-redis-lock to also prevent concurrent DB operations at the Django view level, but this feels like a more natural fit to do at the DB level.

JK Laiho
  • 3,538
  • 6
  • 35
  • 42

4 Answers4

3

I cannot tell you what to do in Django, but I can explain it on the SQL level.

You can never deadlock by modifying only a single row. All you can ever get is a live lock, where one updating transaction has to wait until the previous one commits. This live lock cannot be avoided, it is the way that the database serializes modifications of the same row.

The only way you can get a deadlock is if several database transactions try to lock the same objects (plural!) in a different order.

Here are a few tips that you can use to avoid problems:

  • Keep your database transactions as short as possible, so that nobody has to wait long for a lock. That also reduces the risk of deadlocks.

  • Don't modify more data in a single transaction than is absolutely necessary for consistency. The more rows you modify (lock), the greater the risk of a deadlock.

  • Update the counter as the last activity before you commit (or as late as possible), so that the row is locked as short as possible. If you always update the counter as the last activity, you can never run into a deadlock on account of that update!

  • Use a check constraint on the column if you want to make sure that is never exceeds a certain value.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
2

From my experience, using isolation level serializable is the way to go indeed, and seems to be the preferred way to avoid deadlocks by the postgres docs:

at the cost of:

This monitoring does not introduce any blocking beyond that present in repeatable read, but there is some overhead to the monitoring, and detection of the conditions which could cause a serialization anomaly will trigger a serialization failure.

The monitoring of read/write dependencies has a cost, as does the restart of transactions which are terminated with a serialization failure,

but still often better than select for update:

but balanced against the cost and blocking involved in use of explicit locks and SELECT FOR UPDATE or SELECT FOR SHARE, Serializable transactions are the best performance choice for some environments.

So you'd be better off using serializable, but that implies that you implement a "retry" mechanism on the client-side, which in the case of a decrement implemented in sql (as you did with F(count) - 1) should be just re-running your code if you got a 40001 error from postgres during commit.

Florian Klein
  • 8,692
  • 1
  • 32
  • 42
  • OK, so do I have this correct: `serializable` avoids deadlocks, but requires implementing retries. `SELECT FOR UPDATE` guarantees atomicity, but at the risk of deadlocks. Wth `serializable`, the retried code would also have to check that the quota is not down to zero before performing the `update`, which in the case of simple decrements like this would basically act as an atomicity guarantee. Anything I'm missing? – JK Laiho Jul 07 '23 at 11:33
  • 1
    You can still get deadlocks with `SERIALIZABLE`. – Laurenz Albe Jul 07 '23 at 14:10
-1

You can try database constraints, that way the responsibility for keeping your data consistent will be taken care by your database. Django have support for constraints, so you can check documentation and experiment a little with it

https://docs.djangoproject.com/en/4.2/ref/models/options/#django.db.models.Options.constraints

In your case it should be something like

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()

        class Meta:
        constraints = [
            models.CheckConstraint(check=models.Q(count__gte=0), name="count_gte_0"),
        ]

That way, when you save your changes, database will check your data and raise error if it fails

Alexandr Zayets
  • 299
  • 1
  • 8
-1

something like this will stop your count to go negative.

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()
    
    def decrement(self):
        result = Quota.objects.select_for_update().filter(id=self.id,count__gt=0).update(F('count')-1)

and now you can check if result is 0 then the filter has failed and maybe there is no row with count bigger than 0.( so basically never gets below zero)

and if its 1 then database found a row and performed a decrement by 1.

every time you call decrement your code waits to for select_for_update lock until it is released that row ( if there is a lock on it) so deadlock might happen you can use nowait=True option but beware of raised exceptions.(doc)

you can also wrap this whole method in an atomic transaction and even check if anything went wrong to restore it to the last point.

Mojtaba
  • 583
  • 1
  • 5
  • 15
  • Sorry, but this does not address the question. The question seeks a method that is both atomic for concurrent decrements and has no risk of deadlocking. – JK Laiho Jul 05 '23 at 10:19
  • is there any code that needs to lock database or this row and depends on this part? if this code is used there wont be any deadlock but it can make some requests wait a time so database operations are done. – Mojtaba Jul 05 '23 at 10:31
  • because as I know it deadlocks happen if two transactions are dependent on each other to release a resource so none of them releases the lock but this situation won't happen here because each request will eventually releases the row its working on. – Mojtaba Jul 05 '23 at 10:33