26

In my Django app very often I need to do something similar to get_or_create(). E.g.,

User submits a tag. Need to see if that tag already is in the database. If not, create a new record for it. If it is, just update the existing record.

But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.

This must be a very common situation. How do I handle it in a threadsafe way?

Continuation
  • 12,722
  • 20
  • 82
  • 106
  • 1
    One of the two threads will get a duplicate record error and an exception. There won't be duplicate data. – S.Lott Jul 05 '11 at 17:33

4 Answers4

49

Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:

This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in a get_or_create call (see unique or unique_together), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.

If you are using MySQL, be sure to use the READ COMMITTED isolation level rather than REPEATABLE READ (the default), otherwise you may see cases where get_or_create will raise an IntegrityError but the object won’t appear in a subsequent get() call.

From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create

Here's an example of how you could do it:

Define a model with either unique=True:

class MyModel(models.Model):
    slug = models.SlugField(max_length=255, unique=True)
    name = models.CharField(max_length=255)

MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>})

... or by using unique_togheter:

class MyModel(models.Model):
    prefix = models.CharField(max_length=3)
    slug = models.SlugField(max_length=255)
    name = models.CharField(max_length=255)

    class Meta:
        unique_together = ("prefix", "slug")

MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>})

Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.

Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.

cdosborn
  • 3,111
  • 29
  • 30
Emil Stenström
  • 13,329
  • 8
  • 53
  • 75
11

This must be a very common situation. How do I handle it in a threadsafe way?

Yes.

The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.

If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.

Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 1
    I've experienced duplicate entries in a postgres database that should have been unique when I was using `get_or_create` in a view method that was getting concurrent requests, I think this is a valid concern. – A Lee Jul 05 '11 at 18:35
  • 2
    @A Lee: With unique index constraints correctly defined, a duplicate should not be possible. How were you able to circumvent the unique index constraint? – S.Lott Jul 05 '11 at 18:54
  • Ah, that would've fixed the issue now that I think about it more clearly. The `get_or_create` used multiple fields and I moved it to a different execution path instead of leaving it in the view and adding a unique constraint across the multiple model fields. – A Lee Jul 06 '11 at 04:14
  • @A Lee: You can still fix it. – S.Lott Jul 06 '11 at 10:27
  • @S.Lott: so when get_or_create attempts to create a record gets a "duplicate" exception, does it automatically try to "get" the record instead? Or do I have to do that in my code? What type of exception would Django throw in that case? I don't see any "duplicate" exception in django.core.exceptions.py – Continuation Jul 06 '11 at 18:47
  • @Continuation: The underlying database raises an exception which is passed through the Django ORM. Try it from the command-line to see what happens. – S.Lott Jul 06 '11 at 18:53
3

try transaction.commit_on_success decorator for callable where you are trying get_or_create(**kwargs)

"Use the commit_on_success decorator to use a single transaction for all the work done in a function.If the function returns successfully, then Django will commit all work done within the function at that point. If the function raises an exception, though, Django will roll back the transaction."

apart from it, in concurrent calls to get_or_create, both the threads try to get the object with argument passed to it (except for "defaults" arg which is a dict used during create call in case get() fails to retrieve any object). in case of failure both the threads try to create the object resulting in multiple duplicate objects unless some unique/unique together is implemented at database level with field(s) used in get()'s call.

it is similar to this post How do I deal with this race condition in django?

Community
  • 1
  • 1
vijay shanker
  • 2,517
  • 1
  • 24
  • 37
2

So many years have passed, but nobody has written about threading.Lock. If you don't have the opportunity to make migrations for unique together, for legacy reasons, you can use locks or threading.Semaphore objects. Here is the pseudocode:

from concurrent.futures import ThreadPoolExecutor
from threading import Lock

_lock = Lock()


def get_staff(data: dict):
    _lock.acquire()
    try:
        staff, created = MyModel.objects.get_or_create(**data)
        return staff
    finally:
        _lock.release()


with ThreadPoolExecutor(max_workers=50) as pool:
    pool.map(get_staff, get_list_of_some_data())
Mastermind
  • 454
  • 3
  • 11
  • What if, I've done `MyModel.objects.get_or_create(**data)` in other places. How can you avoid race condition in that scenario? – Sagar Adhikari Jul 09 '21 at 14:43
  • I think we need to avoid doing `MyModel.objects.get_or_create(**data)` in other places, and always use this function for the purpose. Am I correct? – Sagar Adhikari Jul 09 '21 at 14:47
  • 1
    You can override get_or_create with thread lock in QuerySet Manager, like here: https://code.djangoproject.com/attachment/ticket/13105/django-1.1.1-thread-safe.patch However, this problem appears only in high concurrent places like pools, if your are using pool with locks in background task for e.g. and native django view, there's very small chances for race condition. But, if you get this, you should think about global locks, such as redis locks, cache values, etc. – Mastermind Jul 10 '21 at 20:47
  • Avoiding using it also helps, but if you really need get_or_create, think about above example. Be careful, you can lock your worker by global lock in a view, so it can slow down your web app. – Mastermind Jul 10 '21 at 20:48