-1

I want to use scrapy spider in Django views and I tried using CrawlRunner and CrawlProcess but there are problems, views are synced and further crawler does not return a response directly

I tried a few ways:

# Core imports.
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

# Third-party imports.
from rest_framework.views import APIView
from rest_framework.response import Response

# Local imports.
from scrapy_project.spiders.google import GoogleSpider


class ForFunAPIView(APIView):
    def get(self, *args, **kwargs):
        process = CrawlerProcess(get_project_settings())
        process.crawl(GoogleSpider)
        process.start()
        return Response('ok')

is there any solution to handle that and run spider directly in other scripts or projects without using DjangoItem pipeline?

Mohsn
  • 92
  • 2
  • 7
  • 1
    Does this answer your question? [Building a RESTful Flask API for Scrapy](https://stackoverflow.com/questions/32724537/building-a-restful-flask-api-for-scrapy) – May.D Jan 04 '23 at 08:01

2 Answers2

1

you didn't really specify what the problems are, however, I guess the problem is that you need to return the Response immediately, and leave the heavy call aka function to run in the background, you can alter your code as following, to use the Threading module

from threading import Thread

class ForFunAPIView(APIView):
    def get(self, *args, **kwargs):

        process = CrawlerProcess(get_project_settings())
        process.crawl(GoogleSpider)

        thread = Thread(target=process.start)
        thread.start()
        
        return  Response('ok')
Radwan
  • 54
  • 3
  • 1
    i am not sure you would be able to run the spider using multithreading since the twisted reactor doesn't like being restarted. this will first time (maybe?) but will raise an error after that i think you have to use multiprocessing. – zaki98 Jan 03 '23 at 14:46
  • thanks for your help, yes I didn't explain more about what I want to do, I got it. – Mohsn Jan 04 '23 at 07:54
  • 1
    @zaki98 you might be right, however, I used similar code before for a testing project and it worked nicely with the Threading module, without the need to use multiprocessing, BTW check this out it's a helpful answer. for a similar problem, https://stackoverflow.com/questions/32724537/building-a-restful-flask-api-for-scrapy#answer-32784312 – Radwan Jan 04 '23 at 12:31
0

after a while of searching for this topic, I found a good explanation here: Building a RESTful Flask API for Scrapy

Mohsn
  • 92
  • 2
  • 7