35

This code is supposed to get or create an object and update it if necessary. The code is in production use on a website.

In some cases - when the database is busy - it will throw the exception "DoesNotExist: MyObj matching query does not exist".

# Model:
class MyObj(models.Model):
    thing = models.ForeignKey(Thing)
    owner = models.ForeignKey(User)
    state = models.BooleanField()
    class Meta:
        unique_together = (('thing', 'owner'),)

# Update or create myobj
@transaction.commit_on_success
def create_or_update_myobj(owner, thing, state)
    try:
        myobj, created = MyObj.objects.get_or_create(owner=user,thing=thing)

    except IntegrityError:
        myobj = MyObj.objects.get(owner=user,thing=thing)
        # Will sometimes throw "DoesNotExist: MyObj matching query does not exist"

    myobj.state = state
    myobj.save()

I use an innodb mysql database on ubuntu.

How do I safely deal with this problem?

Hobhouse
  • 15,463
  • 12
  • 35
  • 43

4 Answers4

44

This could be an off-shoot of the same problem as here:

Why doesn't this loop display an updated object count every five seconds?

Basically get_or_create can fail - if you take a look at its source, there you'll see that it's: get, if-problem: save+some_trickery, if-still-problem: get again, if-still-problem: surrender and raise.

This means that if there are two simultaneous threads (or processes) running create_or_update_myobj, both trying to get_or_create the same object, then:

  • first thread tries to get it - but it doesn't yet exist,
  • so, the thread tries to create it, but before the object is created...
  • ...second thread tries to get it - and this obviously fails
  • now, because of the default AUTOCOMMIT=OFF for MySQLdb database connection, and REPEATABLE READ serializable level, both threads have frozen their views of MyObj table.
  • subsequently, first thread creates its object and returns it gracefully, but...
  • ...second thread cannot create anything as it would violate unique constraint
  • what's funny, subsequent get on the second thread doesn't see the object created in the first thread, due to the frozen view of MyObj table

So, if you want to safely get_or_create anything, try something like this:

 @transaction.commit_on_success
 def my_get_or_create(...):
     try:
         obj = MyObj.objects.create(...)
     except IntegrityError:
         transaction.commit()
         obj = MyObj.objects.get(...)
     return obj

Edited on 27/05/2010

There is also a second solution to the problem - using READ COMMITED isolation level, instead of REPEATABLE READ. But it's less tested (at least in MySQL), so there might be more bugs/problems with it - but at least it allows tying views to transactions, without committing in the middle.

Edited on 22/01/2012

Here are some good blog posts (not mine) about MySQL and Django, related to this question:

http://www.no-ack.org/2010/07/mysql-transactions-and-django.html

http://www.no-ack.org/2011/05/broken-transaction-management-in-mysql.html

Community
  • 1
  • 1
Tomasz Zieliński
  • 16,136
  • 7
  • 59
  • 83
3

Your exception handling is masking the error. You should pass a value for state in get_or_create(), or set a default in the model and database.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1
    At the time I run create_or_update_myobj the 'owner' might already have a 'thing' in a different 'state'. In that case I need to get the existing 'thing' and change the 'state'. – Hobhouse Feb 10 '10 at 08:52
  • 1
    Or it might not have *any* state because there is no such record, at which point it tries to create a new record, at which point it promptly implodes. – Ignacio Vazquez-Abrams Feb 10 '10 at 08:59
  • Interesting, though your blog is private, so can't read the posts. – Stuart Axon Oct 09 '13 at 15:29
  • @Hobhouse @IgnacioVazquez-Abrams You're both half right. You need to pass `state` in with the defaults kwarg https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create – CrazyCasta Jun 13 '16 at 19:22
1

One (dumb) way might be to catch the error and simply retry once or twice after waiting a small amount of time. I'm not a DB expert, so there might be a signaling solution.

SapphireSun
  • 9,170
  • 11
  • 46
  • 59
1

Since 2012 in Django we have select_for_update which lock rows until the end of the transaction.

To avoid race conditions in Django + MySQL under default circumstances:

  • REPEATABLE_READ in the Mysql
  • READ_COMMITTED in the Django

you can use this:

with transaction.atomic():
   instance = YourModel.objects.select_for_update().get(id=42)
   instance.evolve()
   instance.save()

The second thread will wait for the first thread (lock), and only if first is done, the second will read data saved by first, so it will work on updated data.

Then together with get_or_create:

def select_for_update_or_create(...):
    instance = YourModel.objects.filter(
        ...
    ).select_for_update().first()

    if order is None:
        instnace = YouModel.objects.create(...)

    return instance

The function must be inside transaction block, otherwise, you will get from Django: TransactionManagementError: select_for_update cannot be used outside of a transaction


Also sometimes it's good to use refresh_from_db() In case like:

instance = YourModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.attr = response.something

you'd like to see:

instance = MyModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.refresh_from_db()  # 3
instance.attr = response.something

and that # 3 will reduce a lot a time window of possible race conditions, thus chance for that.

Sławomir Lenart
  • 7,543
  • 4
  • 45
  • 61