I deployed my web-crawler to the AWS Lambda. Then While testing, it ran correctly for the first time, but the second time it gave this error. raise error.reactornotrestartable() twisted.internet.error.reactornotrestartable in AWS lambda
File "/var/task/main.py", line 19, in run_spider
reactor.run()
File "/var/task/twisted/internet/base.py", line 1282, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/var/task/twisted/internet/base.py", line 1262, in startRunning
ReactorBase.startRunning(self)
File "/var/task/twisted/internet/base.py", line 765, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
The crawler worked fine on my local python environment. The function I am trying to run inside main.py is this
def run_spider(event, s):
given_links = []
print(given_links)
for t in event["Records"]:
given_links.append(t["body"])
runner = CrawlerRunner(s)
deferred = runner.crawl('spider', crawl_links=given_links)
deferred.addCallback(lambda _: reactor.stop())
reactor.run()
def lambda_handler(event, context=None):
s = get_project_settings()
s['FEED_FORMAT'] = 'csv'
s['FEED_URI'] = '/tmp/output.csv'
run_spider(event, s)
where the event looks like this:
{
"Records": [
{
"body": "https://example.com"
}
]
}
Initially, I was using CrawlerProcess instead of CrawlerRunner, but it also gave the same error. Then after looking through some of the answers on StackOverflow, I changed my code to use CrawlerRunner. Some people also suggested using Crochet, I tried that and got this error:
ValueError: signal only works in main thread in scrapy
What can I do to resolve this error?