1

My interface must give users an option to start a long web-scraping operation, which could take from some minutes to a couple of hours. As this operation runs, it will persist data to the database. It's coded in this fashion:

def my_view(request):
    batch = BatchTable.objects.create(user=request.user)  # Batch model registers who started the operation
    try:
        long_running_operation(batch)  # How can THIS be made to run after response being sent?
    except Exception as ex:
        batch.fail = str(ex)
        batch.failed = datetime.now()
        batch.save()
    return JsonResponse({'success': 'True', 'message': 'Batch started'})


def long_running_operation(batch):
    list_response = requests.get('https://agiven.web.service/rest/api/objects')
    object_list = list_response.json()
    batch.total = len(object_list)
    batch.save()
    for object_entry in list_response.json():
        object_response = requests.get(f'https://agiven.web.service/rest/api/objects/{object_entry.id}')
        object_dict = object_response.json()
        object_dict['batch'] = batch  # to link the entry to the batch in which it was read and created
        ObjectTable.objects.create(**object_dict)  # to persist object data in the bd
        batch.progress += 1
        batch.save()
    batch.finalized = datetime.now()
    batch.save()

my_view() is hit by an AJAX request, but of course this request will hang until long_running_operation() has finished.

Is there some way of sending the response first from my_view() and continuing execution by only then calling long_running_operation()?

VBobCat
  • 2,527
  • 4
  • 29
  • 56
  • 6
    Make use of a tool like `Celery` that will asynchronously run a task: https://realpython.com/asynchronous-tasks-with-django-and-celery/ – Willem Van Onsem Feb 08 '20 at 21:48

1 Answers1

4

You should use a task queue service, such as celery, to execute the task in another thread/process.

Here is a minimal example, with the following dependencies:

  • pip3 install celery "celery[redis]"
  • brew install redis (macOS) or install manually from here

First create the file proj/proj/celery.py:

import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')

app = Celery('proj', 
    broker_url = 'redis://127.0.0.1:6379',
)

app.config_from_object('django.conf:settings', namespace='CELERY')

app.autodiscover_tasks()

Now create a tasks.py file in your app directory, and move long_running_operation there:

from proj.celery import app

@app.task
def long_running_operation(batch):
    ...

Next, call the task using task.delay from your view:

from myapp.tasks import long_running_operation

def my_view(request):
    batch = BatchTable.objects.create(user=request.user)
    long_running_operation.delay(batch)
    return JsonResponse({'success': 'True', 'message': 'Batch started'})

Finally, before you start your server, make sure you start a redis instance on your machine, by calling redis-server from the cli, then start the worker by calling celery -A proj worker -l info.

In terms of performance and features, celery is certainly one of the "best" tools available for Django ATM, but it definitely adds an excessive amount of overhead for simple use cases.

I haven't tried it myself, but I've heard good things about django-background-tasks, which might be a better alternative if you're app doesn't have a high demand.

Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
  • 1
    Thank you very much! If I could, I would upvote your answer 100x , since you ran the extra mile and taught how to implement it. I'm novice in Django, so I really needed that. `django-background-tasks` was another precious indication, I guess I'll stick with it for the moment, due to its simpler approach. – VBobCat Feb 09 '20 at 12:30