So I want to scrape something like a list of articles, i.e cnn.com. I'm currently using scrapy's CrawlSpider to do so. However, I need them to be scraped in order. At this time, the crawler will crawl the 1st article in the list but then skip to the 31st, 16th, 24th, 9th, etc.
Is there any way to make the spider crawl links on the page in order (i.e top to bottom since recent articles appear at the top of the list) ? I've looked around a little bit and found this, but unlike that post I don't want to crawl the start_urls
in a certain order, I want to crawl the links of a start_url
in order. Is this possible with scrapy? I played around with a couple of things like DEPTH_PRIORITY, but I'm not sure that's what I am looking for.
Any help would be greatly appreciated, thanks!!