I am trying to schedule Scrapy using celery, and ran into the common ReactorNotRestartable error. These past threads have discussed this error.
ReactorNotRestartable - Twisted and scrapy Scrapy - Reactor not Restartable
The library that I am using requires twisted.internet.asyncioreactor.AsyncioSelectorReactor instead of the default one. If I follow the examples, my code stops because the requested reactor doesn't match the running reactor. I've tried modifying it to use the proper reactor, but I'm still getting the same reactor doesn't match exception.
from scrapy.utils.log import configure_logging
from multiprocessing import Process, Queue
def run_spider(spider, domain=None, check=None):
def f(q):
try:
configure_logging()
runner = CrawlerRunner(get_project_settings())
deferred = runner.crawl(spider, domain=domain, check=check)
deferred.addBoth(lambda _: reactor.stop())
reactor = AsyncioSelectorReactor()
reactor.run()
q.put(None)
except Exception as e:
print("EXCEPTION!")
q.put(e)
q = Queue()
p = Process(target=f, args=(q,))
p.start()
result = q.get()
p.join()
if result is not None:
raise result
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
result = context.run(gen.send, result)
File "/code/scrapy_parsing/scripts/run_spider.py", line 203, in crawl
yield runner.crawl(spider)
File "/usr/local/lib/python3.10/site-packages/scrapy/crawler.py", line 232, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.10/site-packages/scrapy/crawler.py", line 266, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.10/site-packages/scrapy/crawler.py", line 271, in _create_crawler
return Crawler(spidercls, self.settings)
File "/usr/local/lib/python3.10/site-packages/scrapy/crawler.py", line 103, in __init__
verify_installed_reactor(reactor_class)
File "/usr/local/lib/python3.10/site-packages/scrapy/utils/reactor.py", line 138, in verify_installed_reactor
raise Exception(msg)
Exception: The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)```