I have more than 100 spiders and i want to run 5 spiders at a time using a script. For this i have created a table in database to know about the status of a spider i.e. whether it has finished running , running or waiting to run.
I know how to run multiple spiders inside a script
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
for i in range(10): #this range is just for demo instead of this i
#find the spiders that are waiting to run from database
process.crawl(spider1) #spider name changes based on spider to run
process.crawl(spider2)
print('-------------this is the-----{}--iteration'.format(i))
process.start()
But this is not allowed as the following error occurs:
Traceback (most recent call last):
File "test.py", line 24, in <module>
process.start()
File "/home/g/projects/venv/lib/python3.4/site-packages/scrapy/crawler.py", line 285, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/home/g/projects/venv/lib/python3.4/site-packages/twisted/internet/base.py", line 1242, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/home/g/projects/venv/lib/python3.4/site-packages/twisted/internet/base.py", line 1222, in startRunning
ReactorBase.startRunning(self)
File "/home/g/projects/venv/lib/python3.4/site-packages/twisted/internet/base.py", line 730, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
I have searched for above error and not able to resolve it. Managing spiders can be done via ScrapyD
but we do not want to use ScrapyD
as many spiders are still in development phase.
Any workaround for above scenario is appreciated.
Thanks