I am currently working on my scrapy project, where I scrap content from internet forums.
I want my application to offer different scenarios of scraping, so I wrote a method that should handle these scenarios based on some input flags
def work_executor(self, response):
yield from self.parse_categories(response)
if not parse_only_categories:
yield from self.parse_topics()
Parse topics method should read categories from mongodb and send request to each category and parse posts. However the parse_topics method is never executed although flag is set.
Work executor is called from parse method of my spider
def parse(self, response):
yield from self.work_executor(response)
Earlier I did it this way, which worked
def parse_categories(self, response):
#some working code
yield scrapy.Request(url=self.base_domain +link,callback=self.parse_topics)
This solution is hovewer not generic enough for my project so that I am looking for alternatives of how to achieve this efect in the way I tried (and described in the first part of my post)
Any help would be greatly appreciated.