2

For our blogging platform, we have an "Article" model, that contains an "updated" datetime field:

class Article(models.Model):
    updated = models.DateTimeField(null=True, blank=True)
    ...

When an article gets opened by any visitor for the first time in 24 hours, we do some time consuming calculations on different model fields, subsequently saving the model to the database. With this, we also update our "updated" field to the current datetime.now().

if (datetime.now() - article.updated).days > 1:
    # do some time consuming calculations
    article.updated = datetime.now()
    article.save()

When an article is requested more or less simultaneously, the time consuming operations on the first request have not finished, yet, causing the once-per-day operation to start again on the same object (article.updated has still the old value). May it help calling article.save() additionally right before starting the calculations? Or it this data postponed from saving to the database until the request has finished?

Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97

3 Answers3

2

Use the queryset select_for_update introduced in Django 1.4 which does a row level locking in the database. All matched entries will be locked until the end of the transaction block, meaning that other transactions will be prevented from changing or acquiring locks on them. There are a few gotchas specific to datgabase backend so make sure to read up and test it before relying totally on it.

Some other ways to do it independent of the implementation is by customizing your models to have a locked boolean attribute. Not very neat but a workable solution. See What is the simplest way to lock an object in Django

Community
  • 1
  • 1
Pratik Mandrekar
  • 9,362
  • 4
  • 45
  • 65
  • Thanks Pratik! We're trying select_for_update() ... give us a few days to see if it works. The concurrent requests happen each couple of days ... – Simon Steinberger Sep 21 '12 at 15:57
2

Some suggestions:

  • It'd be better to move time-consuming calculations from request-response cycle to background. Message queues could be used here (like popular celery). I think it's the best solution, but it may require some additional administration that could be overkill for simple tasks;
  • If you use cache, you could set a flag that object is locked. If cache is common for different interpreters (like memcached), it'll work even if you have many Python interpreters that run your app;
  • You could schedule update procedure (using cron and custom Django management command) to update all objects that were updated >24 hours ago. It'll work unless you have huge amount of objects and considerable time of processing.
demalexx
  • 4,661
  • 1
  • 30
  • 34
  • Thanks, I like and appreciate your suggestions. However, we really do have a huge amount of objects, which is why we had to replace our cronjob with this "real time" system :) The cronjob would have had to run so often, that it just wouldn't make sense any longer ... – Simon Steinberger Sep 23 '12 at 09:41
  • @Nasmon, then message queues would be the best solution, if you agree to spend some time setting it up :) – demalexx Sep 23 '12 at 09:44
  • I've put Celery on our todo list .. sounds really good! For the moment, and as a quick solution, select_for_update works well. – Simon Steinberger Sep 24 '12 at 08:26
2

Short version:

@transaction.commit_on_success
def update_article( article_id ):
    article = Article.objects.select_for_update().get( pk = article_id )
    if (datetime.now() - article.updated).days > 1:
        # do some time consuming calculations
        article.updated = datetime.now()
        article.save()

select_for_update() locks the db row (article with ID article_ID). The row gets unlocked at the end of the transaction, wich is at the end of the function since update_article() is wrapped by @transaction.commit_on_success.

Ps : available since Django 1.4

Sdra
  • 2,297
  • 17
  • 30
  • "select_for_update" (accepted answer) does work. But it's nice to have an alternative approach. Thank you :-) – Simon Steinberger Apr 15 '13 at 20:37
  • thx ;) ... but actually I think without the ``@transaction.commit_on_success`` decorator ``select_for_update()`` won't work. ``select_for_update()`` should only make sense within a transaction. That's why I added also this answer... more comments for confirming this would be welcome! – Sdra Apr 16 '13 at 10:18
  • Our project code has changed a lot since then, so - unfortunately - I can't confirm it. However, Django's docs don't mention the decorator requirement: https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-for-update ... – Simon Steinberger Apr 16 '13 at 11:32
  • 1
    @Nasmon: No, indeed, they don't mention the decorator but the transaction, since you could implement a transaction in different ways (one of which is the decorator)... I found it out yesterday, without the decorator the select_for_update was not working! – Sdra Apr 16 '13 at 13:17