I have a website for which my crawler needs to follow a sequence. So for example, it needs to go a1, b1, c1 before it starts going a2 etc. each of a, b and c are handled by different parse functions and the corresponding urls are created in a Request object and yielded. The following roughly illustrates the code I'm using:
class aspider(BaseSpider):
def parse(self,response):
yield Request(b, callback=self.parse_b, priority=10)
def parse_b(self,response):
yield Request(c, callback=self.parse_c, priority=20)
def parse_c(self,response)
final_function()
However, I find that the sequence of crawls seem to be a1,a2,a3,b1,b2,b3,c1,c2,c3 which is strange since I thought Scrapy is supposed to guarantee depth first.
The sequence doesn't have to be strict, but the site I'm scraping has a limit in place so Scrapy need to start scraping level c as soon as it can before 5 of level bs get crawled. How can this be achieved?