4

Both Scrapy and Django Frameworks are standalone best framework of Python to build crawler and web applications with less code, Though still whenever You want to create a spider you always have to generate new code file and have to write same piece of code(though with some variation.) I was trying to integrate both. But stuck at a place where i need to send the status 200_OK that spider run successfully, and at the same time spider keep running and when it finish off it save data to database.

Though i know the API are already available with scrapyd. But i Wanted to make it more versatile. That lets you create crawler without writing multiple file. I thought The Crawlrunner https://docs.scrapy.org/en/latest/topics/practices.html would help in this,therefor try this thing also t Easiest way to run scrapy crawler so it doesn't block the script but it give me error that the builtins.ValueError: signal only works in main thread

Even though I get the response back from the Rest Framework. But Crawler failed to run due to this error does that mean i need to switch to main thread? I am doing this with a simple piece of code

spider = GeneralSpider(pk)
runner = CrawlerRunner()
d = runner.crawl(GeneralSpider, pk)
d.addBoth(lambda _: reactor.stop())
reactor.run()
Gaurav
  • 533
  • 5
  • 20

1 Answers1

2

I ran scrapy spider in django view, and sharing my code.

settings_file_path = "scraping.settings"  # Scrapy Project Setting
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', settings_file_path)
settings = get_project_settings()
runner = CrawlerRunner(settings)

path = "/path/to/sample.py"
path = url.replace('.py', '')
path = url.replace('/', '.')
file_path = ".SampleSpider".format(path)

SampleSpider = locate(file_path)

d = runner.crawl(SampleSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()

I hope it's helpful.

Xueming
  • 168
  • 1
  • 1
  • 11