Integrating Django Rest Framework and Scrapy

Question

Both Scrapy and Django Frameworks are standalone best framework of Python to build crawler and web applications with less code, Though still whenever You want to create a spider you always have to generate new code file and have to write same piece of code(though with some variation.) I was trying to integrate both. But stuck at a place where i need to send the status 200_OK that spider run successfully, and at the same time spider keep running and when it finish off it save data to database.

Though i know the API are already available with scrapyd. But i Wanted to make it more versatile. That lets you create crawler without writing multiple file. I thought The Crawlrunner https://docs.scrapy.org/en/latest/topics/practices.html would help in this,therefor try this thing also t Easiest way to run scrapy crawler so it doesn't block the script but it give me error that the builtins.ValueError: signal only works in main thread

Even though I get the response back from the Rest Framework. But Crawler failed to run due to this error does that mean i need to switch to main thread? I am doing this with a simple piece of code

spider = GeneralSpider(pk)
runner = CrawlerRunner()
d = runner.crawl(GeneralSpider, pk)
d.addBoth(lambda _: reactor.stop())
reactor.run()

score 2 · Answer 1 · answered Oct 19 '20 at 07:46

I ran scrapy spider in django view, and sharing my code.

settings_file_path = "scraping.settings"  # Scrapy Project Setting
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', settings_file_path)
settings = get_project_settings()
runner = CrawlerRunner(settings)

path = "/path/to/sample.py"
path = url.replace('.py', '')
path = url.replace('/', '.')
file_path = ".SampleSpider".format(path)

SampleSpider = locate(file_path)

d = runner.crawl(SampleSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()

I hope it's helpful.

Integrating Django Rest Framework and Scrapy

1 Answers1