1

I'm working project it contains a GET request with a few parameters suchas (airline code and flight date), I pass those parameters to a crawler coded using scrapy . I've created Django command in management/commands to hit the scrapy crawler and as soon the scraping is done the data being saved in particular models. As I want to return that saved data to the same GET request, I have a few questions regarding it.

  • How to return data in GET request? As I couldn't find a way to get returned data from scrapy crawler.

  • How do I make wait that GET request for a particular time when scraping is being done.

  • Assuming GET request is on waiting and scrapping is done now I have to validate if there is scraped data present in database accordingly to the params.

  • Assuming the data is found in database and then how can I return it to the same GET request?

  • If I want to use celery in this whole process what could be the best use of it?

  • your question needs some code... data is returned from a get request through the body of the response. Where the html usually is, or json in an api call. the response is sent when you send the response, so just wait until the scrapy is done crawling before returning the response. Saving the scraped content to the database and then pulling the data from the database to return in the response is a bad idea, better to collect the data while scrapy is crawling and return it all when it is done saving to database – Alexander Aug 15 '22 at 07:57

1 Answers1

0

You can create an async task to scrape data using your scrapper and after that you can check repeatedly if the task has been completed or not, if yes, then you can call your API which can return your scrapped data.

  1. CreateTaskView to create the job
  2. CheckTaskCompletion to create
  3. YourAPIView to get the data

from django.core.management import call_command

from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import status

# configure and import app from celery

from cel.tasks import app
from celery.result import AsyncResult
from django.core.management import call_command


@app.task(bind=True)
def your_async_task(self, *args, **kwargs):
    call_command("your_command_name", *args, **kwargs)

class CreateTaskView(APIView):

    def get(self, request):
        my_task = your_async_task.delay("Hi", keyword_arg="Hello World!")
        return Response({'task_id': my_task.id}, status=status.HTTP_200_OK)

class CheckTaskCompletionView(APIView):

    def get(self, request):
        res = AsyncResult(request.GET.get('task_id'),app=app)
        return Response({'task_state': res.state}, status=status.HTTP_200_OK)

class YourAPIView(APIView):
    
    pass
    # Now you can call your get api to get the relevant data