Disclaimer: I do know that there are several similar questions on SO. I think I've read most if not all of them, but did not find an answer to my real question (see later). I also do know that using celery or other asynchronous queue systems is the best way to achieve long running tasks - or at least use a cron-managed script. There's also mod_wsgi doc about processes and threads but I'm not sure I got it all correct.
The question is:
what are the exact risks/issues involved with using the solutions listed down there? Is any of them viable for long running tasks (ok, even though celery is better suited)? My question is really more about understanding the internals of wsgi and python/django than finding the best overall solution. Issues with blocking threads, unsafe access to variables, zombie processing, etc.
Let's say:
- my "long_process" is doing something really safe. even if it fails i don't care.
- python >= 2.6
- I'm using mod_wsgi with apache (will anything change with uwsgi or gunicorn?) in daemon mode
mod_wsgi conf:
WSGIDaemonProcess NAME user=www-data group=www-data threads=25
WSGIScriptAlias / /path/to/wsgi.py
WSGIProcessGroup %{ENV:VHOST}
I figured that these are the options available to launch separate processes (meant in a broad sense) to carry on a long running task while returning quickly a response to the user:
os.fork
import os
if os.fork()==0:
long_process()
else:
return HttpResponse()
subprocess
import subprocess
p = subprocess.Popen([sys.executable, '/path/to/script.py'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
(where the script is likely to be a manage.py command)
threads
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
NB.
Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
The main thread will quickly return (the httpresponse). Will the spawned long thread block wsgi from doing something else for another request?!
multiprocessing
from multiprocessing import Process
p = Process(target=_bulk_action,args=(action,objs))
p.start()
return HttpResponse()
This should solve the thread concurrency issue, shouldn't it?
So those are the options I could think of. What would work and what not, and why?