40

I have a Django model that can only be accessed using get_or_create(session=session), where session is a foreign key to another Django model.

Since I am only accessing through get_or_create(), I would imagine that I would only ever have one instance with a key to the session. However, I have found multiple instances with keys to the same session. What is happening? Is this a race condition, or does get_or_create() operate atomically?

Mantas Vidutis
  • 16,376
  • 20
  • 76
  • 92

4 Answers4

55

NO, get_or_create is not atomic.

It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.

For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.

It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.

If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.

More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).

With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.

Refs and citations:

mipadi
  • 398,885
  • 90
  • 523
  • 479
Stefano
  • 18,083
  • 13
  • 64
  • 79
  • 2
    Many _NOs_, and justified (good answer regarding the prerequisites), but hey, in the end, it means that _get_or_create_ isn't always bad, and if all criteria are met, then it just works... missed that part in your answer. ;) – class stacker Feb 01 '13 at 10:15
  • @ClassStacker indeed - there were already answers and I wanted to complement them. But you're right, it's absolutely not bad, actually it's some useful syntactic sugar! – Stefano Feb 01 '13 at 16:12
  • I totally understand your motivation for complementing the other posts. Interestingly, though, I was looking for a discussion of _get_or_create_ which comes to a conclusion. But I found only discussions with half-true claims and additional _look-at-that-blog-post_ links, so when I came across your A, I found it was short and true, but sadly, without a conclusion for someone who's looking for a bit of orientation. Hence the comment. ;) – class stacker Feb 05 '13 at 08:05
  • @ClassStacker i understand your point - but there was an already accepted solution :) I edited mine to make it more comprehensive. Mileage might vary with the database, but not much. With respect with your precise point, I would need to know how your "session" key is generated - and also why don't you use [django db-stored sessions](https://docs.djangoproject.com/en/dev/topics/http/sessions/#using-database-backed-sessions) ! – Stefano Feb 05 '13 at 08:48
  • Cool edit. -- I'm not the OP, so can't update you on the OP's motivation. – class stacker Feb 05 '13 at 09:04
  • If there is no unique then one may use `.update_or_create()`. – Suor Mar 19 '21 at 03:26
17

Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :

try:
    return self.get(**lookup), False
except self.model.DoesNotExist:
    params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
    params.update(defaults)
    obj = self.model(**params)
    sid = transaction.savepoint(using=self.db)
    obj.save(force_insert=True, using=self.db)
    transaction.savepoint_commit(sid, using=self.db)
    return obj, True

So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.

recamshak
  • 854
  • 7
  • 8
  • 1
    I don't see how you conclude that it's not thread-safe; at least your code snippet does not fully reflect the current implementation (and as far as I know, not even previous). It's true that the implementation used to have an issue, but it has been solved before @mvid asked. You left out the additional _try_ and _get_, so if the underlying database has a unique index for the field(s) you which are used for the _get_ part, then only one Thread will _insert_, the others will _get_; and if all threads use the same (or reasonable enough) defaults, all goes well. – class stacker Feb 01 '13 at 10:12
  • 1
    @classstacker as you mentioned, the latest get_or_create() method code is NOT thread safe if the underlying database has not an unique index in the field(s) used in the query. This can be an issue when the model is defined in a 3rd party library whose code you do not want to change. I just ran into that with Django Q. – Alan Evangelista Jul 08 '21 at 10:14
8

Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:

Community
  • 1
  • 1
Tomasz Zieliński
  • 16,136
  • 7
  • 59
  • 83
0

I was having this problem with a view that calls get_or_create.

I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.

The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):

http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/

EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):

try:
    cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
except DatabaseError: 
    pass
AJJ
  • 7,365
  • 7
  • 31
  • 34
  • Not sure why this was downvoted. This did the trick for me. Thanks. – Cerin Jul 06 '14 at 01:49
  • 8
    Locking the entire table is a huge performance bottleneck. It's incredibly heavy handed, you might as well turn the entire website off while you're serving a single request. It will work for very small applications and will not scale. – Travis D Nov 16 '17 at 16:30