20

I need to populate a SQLite database every few minutes in Django, but I want to serve stale data until the data is available for the database to be updated. (i.e. I don't want to block for the data to be gathered; the only time I can block is if there is a lock on the database, during which I have no choice.)

I also don't want to install a separate program or library.

How would I go about setting up another thread that could call save() on a bunch of models, without running into threading issues?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 4
    "I also don't want to install a separate program or library" Often an expensive policy. Celery does this for you. Why not just install celery? – S.Lott Jul 06 '11 at 20:50
  • 3
    @S.Lott: Because it's going to be on someone's server and I want to avoid dumping things on the server as much as I can. Isn't it a little overkill for just a separate thread? – user541686 Jul 06 '11 at 20:58
  • "Isn't it a little overkill for just a separate thread?" No. "dumping things on the server"? One installation is dumping? I don't get the objection. Do you want it to work, or do you want to write, test and debug a lot of code? – S.Lott Jul 06 '11 at 20:59
  • 1
    @S.Lott: I want it to work, but I don't see why I need an entire **library** just for making a single thread. If you can convince me, I'm all for it, but it really seems unnecessary to me right now, especially since this is in Python (which seems to include a module for, like, everything). – user541686 Jul 06 '11 at 21:02
  • Sadly. It's not a "simple thread". You keep saying that, but it's false. There's no "convincing" you if you keep claiming that it's a "simple thread". If it was actually simple, you would have done it already. – S.Lott Jul 06 '11 at 21:04
  • 2
    @S.Lott: `If it was actually simple, you would have done it already.` Well, I'm actually new to both Python *and* Django, so I didn't imagine it to be that difficult; I just thought I don't know how to do it. I'm still not sure *which* part of it requires the library, though: is it the database lock? Is it the threading itself? Or is it something else that complicates the matter, which I'm forgetting? – user541686 Jul 06 '11 at 21:27
  • You can't just spawn a new thread. Django runs in a webserver environment and that means that normally code is only executed whenever the server receives a request, and it means that the code should execute quickly and finish as fast as possible to avoid hogging the webservers threads. Also, as code only executes whenever a request is received, it is difficult to guarantee that something happens "every few minutes". If I were you, I would write a separate python script for this, and run that through cron. Cron is installed everywhere :-) – AHM Jul 06 '11 at 21:52
  • @Mehrdad: "Well, I'm actually new to both Python and Django". All the more reason to install and use the standard solution that almost everyone else uses: celery. Consider removing your "also don't want to install" from the question, since it's a very, very bad idea especially for a n00b. – S.Lott Jul 06 '11 at 22:05
  • 2
    @AHM: Thanks a lot for the info, though I'm not sure I'll go that route. :) @S.Lott: Your comment is like telling a "n00b" to install Eclipse or Visual Studio so that he can make his first Hello, World program. It might be helpful to have an IDE, but (1) it's overkill, (2) the person will think that he will *always* need an IDE, (3) he'll never know the reason behind the advice. So if instead of calling me a n00b, you actually told me *why* I shouldn't do that (maybe as AHM did?), then I might actually understand what you mean. – user541686 Jul 06 '11 at 22:13
  • @Mehrdad: "it's overkill" False. "the person will think that he will always need an IDE". What? "he'll never know the reason behind the advice" I keep trying to explain and you keep rejecting the explanation. It's not simple. How many different ways do I have to say it? – S.Lott Jul 07 '11 at 00:39
  • 1
    @S.Lott: Yes, using an IDE to write small programs **is** overkill. **Very** overkill. If you think using Eclipse/Visual Studio/etc. is a good introduction to programming, then indeed, I'll probably never understand what you mean about this either, so it likely won't be worth arguing over it. :\ – user541686 Jul 07 '11 at 00:59
  • @Mehrdad: The IDE analogy makes no sense. Even with the explanation. It's not comparable in any way to using Celery. Please stop repeating it. – S.Lott Jul 07 '11 at 01:01
  • 1
    @S.Lott: Well in that case, no need for me to argue; thanks for the input. – user541686 Jul 07 '11 at 01:01
  • @Mehrdad: Using an add-on like celery is like using an RDBMS or using Apache. It's essential. It's nothing like using an IDE. – S.Lott Jul 07 '11 at 01:03

6 Answers6

19

If you're looking for a lightweight solution for just executing stuff in background rather than a full-blown task management system, take a look at django-utils. It includes, among other things, an @async function decorator that will make a function execute asynchronously in a separate thread.

Use it like this:

from djutils.decorators import async

@async
def load_data_async():
    # this will be executed in a separate thread
    load_data()

Then you can call either the load_data_async function for background, or the normal load_data function for blocking execution.

Just make sure to install a version before 2.0, since that lacks the @async decorator.

Note: If even installing django-utils would be too much, you can simply download it and include the few required files in your project.

Nick S
  • 555
  • 4
  • 17
Jaka Jaksic
  • 351
  • 4
  • 4
  • 1
    Unfortunately, djutils doesn't seem to be active anymore. At least readthedocs and github pages were removed. – Olli Nov 23 '13 at 14:46
  • 1
    The readthedocs pages still seem to be missing, @Olli, but [the github pages](https://github.com/karthikrish/django-utils) are visible. – Don Kirkby Dec 10 '14 at 23:29
  • The `readthedocs` link was erroneous : django-utils documentation does exist [here](https://django-utils.readthedocs.org/en/latest/index.html). – PLNech Nov 26 '15 at 12:32
  • 1
    I tried using django-utils but it seems not compatible with Django 1.8, isn't it? – Jose Luis de la Rosa Jan 05 '16 at 01:43
  • I cannot do `from djutils.decorators import async` with Django 1.10 either. It wants to import hashcompat which has been deprecated in Django 1.6. I'd like to file a bug report, but the project seems to be dead :( – Pablo Nov 21 '16 at 01:53
18

Celery.

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

Celery is written in Python, but the protocol can be implemented in any language. It can also operate with other languages using webhooks.

Community
  • 1
  • 1
S.Lott
  • 384,516
  • 81
  • 508
  • 779
6

Just a quick update on John Lehmann's answer: django-background-task was unmaintained and incompatible with newer Django version. We updated and extended it with new features a while ago and maintaining the new backward compatible package on Github. The new django-background-tasks app can be downloaded or installed from the PyPI.

phi
  • 333
  • 5
  • 9
5

Depends on whether you need the update to look atomic from the point of view of the readers. If you don't mind seeing old and new data together, just create a custom management command that populates the data, and run it every few minutes from cron.

If you need it to look atomic, wrapping the all the writes in one SQLite transaction via django.db.transaction should probably provide you with the necessary locks.

congusbongus
  • 13,359
  • 7
  • 71
  • 99
che
  • 12,097
  • 7
  • 42
  • 71
  • I've never used `cron` before, but -- unless I'm misunderstanding this -- isn't that a completely separate program? Does that mean I have to create a separate, call `cron` from my server, and run that program to update the database? – user541686 Jul 06 '11 at 21:00
  • 1
    cron is a process that runs in background on linux, freebsd and osx. See: en.wikipedia.org/wiki/Cron. – demux Jul 06 '11 at 21:15
  • @Amar: So like I'd previously understood, it's a completely separate program... which means I need to create a new program just for the update. I'm not sure I'm going to go that route, but +1, thanks for the idea anyway. – user541686 Jul 07 '11 at 01:04
  • 3
    @Mehrdad: Almost every system has some sort of task scheduler. Unix-like ones have `cron`, Windows have http://en.wikipedia.org/wiki/Task_Scheduler – che Jul 07 '11 at 11:57
4

Django Background Task is a databased-backed work queue for Django, loosely based around Ruby's DelayedJob library.

You decorate functions to create tasks:

@background(schedule=60)
def notify_user(user_id):
    # lookup user by id and send them a message
    user = User.objects.get(pk=user_id)
    user.email_user('Here is a notification', 'You have been notified')

Though you still need something which schedules those tasks. Some benefits include automatic retries for failed tasks, and setting maximum duration for a running task.

This does involves another dependency but could be useful to some readers without that restriction.

John Lehmann
  • 7,975
  • 4
  • 58
  • 71
1

I had the same issue but didnt want to run a service like celery to solve the problem.

I found posix_spawn on linux systems. You can write manage.py commands that run in your full django environment. These commands can be executed in the background with this project.

If you need to pass data back to the website during the run, I use memcached.

https://github.com/lukedupin/django_posix_spawn

Luke Dupin
  • 2,275
  • 23
  • 30