How to yield url with orders to let scrapy crawl

Question

here is my code:

def parse(self, response):  
    selector = Selector(response)
    sites = selector.xpath("//h3[@class='r']/a/@href")
    for index, site in enumerate(sites):
        url =  result.group(1) 
        print url
        yield Request(url = site.extract(),callback = self.parsedetail)

def parsedetail(self,response):
    print response.url
    ...
    obj = Store.objects.filter(id=store_obj.id,add__isnull=True)
    if obj:
    obj.update(add=add)

in def parse scarpy will get urls from google the url output like:

www.test.com
www.hahaha.com
www.apple.com
www.rest.com

but when it yield to def parsedetail the url is not with order maybe it will become :

www.rest.com
www.test.com
www.hahaha.com
www.apple.com

is there any way I can let the yield url with order to send to def parsedetail ?
Because I need to crawl www.test.com first.(the data the top url provide in google search is more correctly)
if there is no data in it.
I will go next url until update the empty field .(www.hahaha.com ,www.apple.com,www.rest.com)
Please guide me thank you!

score 2 · Answer 1 · edited May 23 '17 at 11:45

By default, the order which the Scrapy requests are scheduled and sent is not defined. But, you can control it using priority keyword argument:

priority (int) – the priority of this request (defaults to 0). The priority is used by the scheduler to define the order used to process requests. Requests with a higher priority value will execute earlier. Negative values are allowed in order to indicate relatively low-priority.

You can also make the crawling synchronous by passing around the callstack inside the meta dictionary, as an example see this answer.

How to yield url with orders to let scrapy crawl

1 Answers1