I have deployed a scrapy project which crawls whenever an lambda api requests comes.
It runs perfectly for the first api call but later on it fails and throws ReactorNotRestartable error.
As far as I can understand the AWS Lambda ecosystem is not killing the process, hence reactor is still present in the memory.
The lambda log error is as follows:
Traceback (most recent call last):
File "/var/task/aws-lambda.py", line 42, in run_company_details_scrapy
process.start()
File "./lib/scrapy/crawler.py", line 280, in start
reactor.run(installSignalHandlers=False) # blocking call
File "./lib/twisted/internet/base.py", line 1242, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "./lib/twisted/internet/base.py", line 1222, in startRunning
ReactorBase.startRunning(self)
File "./lib/twisted/internet/base.py", line 730, in startRunning
raise error.ReactorNotRestartable()
ReactorNotRestartable
The lambda handler function is:
def run_company_details_scrapy(event, context):
process = CrawlerProcess()
process.crawl(CompanyDetailsSpidySpider)
process.start()
I had a workaround by not stopping the reactor by inserting a flag in the start function
process.start(stop_after_crawl=False)
But the problem with this was that I had to wait until the lambda call timed out.
Tried other solutions, but none of them seems to work.Can anyone guide me how to solve this problem.