So the issue is that I have one spider that crawls through a website, scraping a bunch of product information... Then I would like to have another which takes the list of product links built out in the first, and uses it to for checking purposes.
I realize I could just do this all in one spider, but the spider is already very large (is a generic spider for 25+ different domains), and would like to keep this as separated as possible. Currently I am creating instances of this master spider like follows:
def run_spiders(*urls, ajax=False):
process = CrawlerProcess(get_project_settings())
for url in urls:
process.crawl(MasterSpider, start_page = url, ajax_rendered = ajax)
process.start()
Ideally how this would work is something like as seen in the following:
- Scrapy run multiple spiders from a main spider?
- Is it possible to run another spider from Scrapy spider?
I tried spawning another crawler process within the closed_handler of the MasterSpider, but the reactor is already running so clearly that isn't going to work. Any ideas?
Note that whenever i try to switch to a crawler runner, even if i go by what is exactly in the documentation/questions on here it doesn't exactly work. I'm thinking using the from_crawler might be my way to go, but i'm not entirely sure