73

The sequence I would like to accomplish:

  1. A user clicks a button on a web page
  2. Some functions in model.py start to run. For example, gathering some data by crawling the internet
  3. When the functions are finished, the results are returned to the user.

Should I open a new thread inside of model.py to execute my functions? If so, how do I do this?

Wes Hardaker
  • 21,735
  • 2
  • 38
  • 69
Robert
  • 2,189
  • 6
  • 31
  • 38
  • What are you trying to accomplish? maybe you can do that by frontend tecnologies like AJAx, WebSocket, magic pony... – gipi Jul 11 '13 at 19:27
  • What is magic pony? Can't find it on google... – Gerard Yin Jul 11 '13 at 19:59
  • Possible duplicate of [Multithreading for Python Django](https://stackoverflow.com/questions/18420699/multithreading-for-python-django) – Nemo Apr 29 '18 at 07:02

3 Answers3

78

As shown in this answer you can use the threading package to perform an asynchronous task. Everyone seems to recommend Celery, but it is often overkill for performing simple but long running tasks. I think it's actually easier and more transparent to use threading.

Here's a simple example for asyncing a crawler:

#views.py
import threading
from .models import Crawl

def startCrawl(request):
    task = Crawl()
    task.save()
    t = threading.Thread(target=doCrawl,args=[task.id])
    t.setDaemon(True)
    t.start()
    return JsonResponse({'id':task.id})

def checkCrawl(request,id):
    task = Crawl.objects.get(pk=id)
    return JsonResponse({'is_done':task.is_done, result:task.result})

def doCrawl(id):
    task = Crawl.objects.get(pk=id)
    # Do crawling, etc.

    task.result = result
    task.is_done = True
    task.save()

Your front end can make a request for startCrawl to start the crawl, it can make an Ajax request to check on it with checkCrawl which will return true and the result when it's finished.


Update for Python3:

The documentation for the threading library recommends passing the daemon property as a keyword argument rather than using the setter:

t = threading.Thread(target=doCrawl,args=[task.id],daemon=True)
t.start()

Update for Python <3.7:

As discussed here, this bug can cause a slow memory leak that can overflow a long running server. The bug was fixed for Python 3.7 and above.

nbwoodward
  • 2,816
  • 1
  • 16
  • 24
  • 2
    wouldn't the process created for serving the web request runs until the thread is finished ? – Sandeep Balagopal Jul 10 '20 at 00:37
  • 1
    @SandeepBalagopal That's a good point, and you're probably right, but you still return a response to the user before that process and the daemon process exit. Since the maximum number of processes is an OS level issue I suppose your architecture will determine the limit of the feasibility of this solution. A messaging queue is more robust in that sense or maybe you could use the queue library https://docs.python.org/3.7/library/queue.html – nbwoodward Jul 13 '20 at 17:39
  • @nbwoodward does it means that the worker will not be able to process new requests in that time until the daemon thread finishes? Wont it lead to low throughput (RPS) in that case? – Aniket Singla Oct 05 '20 at 13:21
  • 1
    I'm using that exact method for my webpage and (yet) the haven't been any issues. – CutePoison Apr 05 '21 at 20:17
  • Is using Django's ORM to access the database from different threads like this thread-safe? – Flimm Nov 02 '21 at 11:20
  • 1
    @Flimm that's a good question. Thread safety concerns accessing values in memory from separate threads. Your question is more related to concurrent database access. Django is inherently built for handling concurrent requests on multiple threads and/or processes that all may access the database. So it seems to me that the ORM should be able to handle the concurrency of the threading library as well. – nbwoodward Nov 08 '21 at 20:54
  • 1
    The Django team introduced asynchronous support in v4.0. Would this be relevant? in regards to the async_to_sync etc. methods when handling databases.[Asynchronous Support](https://docs.djangoproject.com/en/dev/topics/async/) – James Bellaby Mar 23 '22 at 09:56
  • 1
    @JamesBellaby I think it's pretty likely there is a good solution using async/await. I'll see if I can find time to experiment with that. Feel free to add an answer or propose an edit to this one if you find time yourself. – nbwoodward Apr 20 '22 at 16:00
  • @AniketSingla Good questions. After test, I find the process will receive new request even the old thread didn't stop. – ramwin Jun 29 '22 at 08:36
  • Beautiful answer. I also use threads a lot, for example, password resets. This also has the advantage of killing blind attacks as a reset takes fixed time whether or the email exists. – Emacs The Viking Feb 03 '23 at 09:19
35
  1. Yes it can multi-thread, but generally one uses Celery to do the equivalent. You can read about how in the celery-django tutorial.
  2. It is rare that you actually want to force the user to wait for the website. While it's better than risks a timeout.

Here's an example of what you're describing.

User sends request
Django receives => spawns a thread to do something else.
main thread finishes && other thread finishes 
... (later upon completion of both tasks)
response is sent to user as a package.

Better way:

User sends request
Django receives => lets Celery know "hey! do this!"
main thread finishes
response is sent to user
...(later)
user receives balance of transaction 
cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
  • 164
    Celery is overkill for many purposes. Please stop recommending it as the magic bullet for anything that needs to not block request/response. It's like recommending an RDBMS whenever anyone asks how to store a line of text. – Andy Baker Dec 12 '14 at 18:12
  • 18
    @andybak Feel free to suggest an alternative. To me, this sounds like a legit use. – cwallenpoole Dec 13 '14 at 04:56
  • 8
    depends on the specifics but you can just spawn a thread and poll for completion, you can use a simple cron job that checks for tasks, or if you do need more features, you can use one of several 'not as complex as celery' projects such as huey or django-background-tasks. – Andy Baker Dec 13 '14 at 11:35
  • 2
    My experience is that spawning a thread is more expensive from an engineering perspective. – cwallenpoole Dec 15 '14 at 15:34
  • 5
    Celery is too heavyweight in many cases and should not be the fallback position for requests involving async work. If an async transaction is going to kill a minute of CPU time, fine, go Celery. When a user logs in, I want to pull certain user data to memcache so that I can access it quickly as they navigate my system. For this, Celery sucks. I don't want the user login page to block while that caching takes place, though. Django is great for some things, but if you depend on sequential, external RPCs (ORM, memcache, etc.), it will flush cycles/memory down the toilet with reckless abandon. – Sniggerfardimungus Apr 16 '15 at 23:11
  • I would add that there's not only requests cycle in Django. I wanted an answer of multiple threading for django admin commands. – FlogFR May 19 '15 at 10:03
  • "Feel free to suggest an alternative" RabbitMQ. – aaa90210 Jul 17 '15 at 06:30
  • or maybe just `threading.Thread`? I have a feeling though, that threads spawned by a view are killed when the view finishes, defeating the purpose (but could not find the right documentation yet). – Herbert Jun 09 '16 at 12:13
  • 1
    You need to set `deamon=True` for it to survive the request handling. – Herbert Jun 09 '16 at 12:28
  • 47
    (If you have other suggestions, then please make them *answers*. I recommended something which I had seen work in the past, it may be outdated, and it may be a battle-axe for a hangnail, but it happened to work. One of the major reasons we *have* this site is so that people can propose alternate answers and not just one-off on comments). – cwallenpoole Jun 09 '16 at 14:05
  • To those stating that "celery is overkill", there are quite a few other async queue solutions for django, you just have to google "django async queue". – bruno desthuilliers Oct 09 '18 at 09:48
  • I recommend even looking into python-rq – Hussain Jan 28 '20 at 02:31
  • use threading in python is not real threading, because python interpreter has something call gil :( – c4f4t0r Feb 07 '20 at 10:53
-4

If you don't want to add some overkill framework to your project, you can simply use subprocess.Popen:

def my_command(request):
    command = '/my/command/to/run'  # Can even be 'python manage.py somecommand'
    subprocess.Popen(command, shell=True)
    command = '/other/command/to/run'
    subprocess.Popen(command, shell=True)
    return HttpResponse(status=204)

[edit] As mentioned in the comments, this will not start a background task and return the HttpResponse right away. It will execute both commands in parallel, and then return the HttpResponse once both are complete. Which is what OP asked.

Thierry J.
  • 2,148
  • 16
  • 23
  • 1
    This doesn't work (at least on my fairly standard setup of django + uwsgi + nginx) to quickly return the HTTP Response while launching a long-running task to churn in the background. It instead launches the subprocess, but will not return the HTTP Response until after the subprocess terminates (even if you add '&' at the end of the command). Further, if the webserver times out it kills the process which will not finish. E.g., try the command with `/bin/sleep 15` (will take 15 seconds) or `/bin/sleep 60` or `/bin/sleep 900 && echo 'hello' > /tmp/tmptest123` (will timeout and not finish). – dr jimbob Jun 09 '18 at 03:23
  • Indeed, but that is not what OP asked for. `subprocess` will let you run multi-threaded functions and return the http response after those are completed. – Thierry J. Jul 30 '19 at 10:17