Scrapy Infinite loop with CrawlerProcess

Question

I'm currently running Scrapy v2.5, and I'd like run infinite loop. My code:

class main():

    def bucle(self, array_spyder, process):
        mongo       = mongodb(setting)
        for spider_name in array_spider:
            process_init.crawl(spider_name, params={ "mongo": mongo, "spider_name": spider_name})
        process.start()
        process.stop()
        mongo.close_mongo()

if __name__ == "__main__":
    setting     = get_project_settings()
    while True:
        process = CrawlerProcess(setting)
        array_spider = process.spider_loader.list()
        class_main = main()
        class_main.bucle(array_spider, process)

But that resulted in the error message as follows:

Traceback (most recent call last):
  File "run_scrapy.py", line 92, in <module>
    process.start()
  File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py", line 327, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1422, in run
    self.startRunning(installSignalHandlers=installSignalHandlers)
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1404, in startRunning
    ReactorBase.startRunning(cast(ReactorBase, self))
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 843, in startRunning
    raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

Can anyone help me??

if you use `Linux` then maybe you should use `cron` to start it every few minutes. — furas, May 09 '21 at 20:14
I'm not sure but this can start many spiders in short time and it can makes problem. — furas, May 09 '21 at 20:17
you could use `print()` to see for what values in has problem. You should check if it has problem with first run or with second - when `process.start()` is executed again after previous `process.stop()`. Maybe all problem makes `process.stop()` which may kill all process and it can't be started again. — furas, May 09 '21 at 20:20
I had the same issue and I fixed it with following question [Crochet](https://stackoverflow.com/a/57347964/11651988) — Murat Demir, May 09 '21 at 21:33

score 0 · Answer 1 · answered May 10 '21 at 00:20

AFAIK there are no simple ways to restart spider, but there is an alternative - spider which never closes. For this you can utilise spider_idle signal.

According to the documentation:

Sent when a spider has gone idle, which means the spider has no further:  
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline

You can also find examples of using Signals in the official documentation.

Scrapy Infinite loop with CrawlerProcess

1 Answers1