In a Django Python app, I launch jobs with Celery (a task manager). When each job is launched, they return an object (lets call it an instance of class X
) that lets you check on the job and retrieve the return value or errors thrown.
Several people (someday, I hope) will be able to use this web interface at the same time; therefore, several instances of class X
may exist at the same time, each corresponding to a job that is queued or running in parallel. It's difficult to come up with a way to hold onto these X
objects because I cannot use a global variable (a dictionary that allows me to look up each X
objects from a key); this is because Celery uses different processes, not just different threads, so each would modify its own copy of the global table, causing mayhem.
Subsequently, I received the great advice to use memcached
to share the memory across the tasks. I got it working and was able to set
and get
integer and string values between processes.
The trouble is this: after a great deal of debugging today, I learned that memcached
's set
and get
don't seem to work for classes. This is my best guess: Perhaps under the hood memcached serializes objects to the shared memory; class X
(understandably) cannot be serialized because it points at live data (the status of the job), and so the serial version may be out of date (i.e. it may point to the wrong place) when it is loaded again.
Attempts to use a SQLite database were similarly fruitless; not only could I not figure out how to serialize objects as database fields (using my Django models.py file), I would be stuck with the same problem: the handles of the launched jobs need to stay in RAM somehow (or use some fancy OS tricks underneath), so that they update as the jobs finish or fail.
My best guess is that (despite the advice that thankfully got me this far) I should be launching each job in some external queue (for instance Sun/Oracle Grid Engine). However, I couldn't come up with a good way of doing that without using a system call, which I thought may be bad style (and potentially insecure).
How do you keep track of jobs that you launch in Django or Django Celery? Do you launch them by simply putting the job arguments into a database and then have another job that polls the database and runs jobs?
Thanks a lot for your help, I'm quite lost.