2

here is my code:

def parse(self, response):  
    selector = Selector(response)
    sites = selector.xpath("//h3[@class='r']/a/@href")
    for index, site in enumerate(sites):
        url =  result.group(1) 
        print url
        yield Request(url = site.extract(),callback = self.parsedetail)

def parsedetail(self,response):
    print response.url
    ...
    obj = Store.objects.filter(id=store_obj.id,add__isnull=True)
    if obj:
    obj.update(add=add)

in def parse scarpy will get urls from google the url output like:

www.test.com
www.hahaha.com
www.apple.com
www.rest.com

but when it yield to def parsedetail the url is not with order maybe it will become :

www.rest.com
www.test.com
www.hahaha.com
www.apple.com

is there any way I can let the yield url with order to send to def parsedetail ?
Because I need to crawl www.test.com first.(the data the top url provide in google search is more correctly)
if there is no data in it.
I will go next url until update the empty field .(www.hahaha.com ,www.apple.com,www.rest.com)
Please guide me thank you!

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
user2492364
  • 6,543
  • 22
  • 77
  • 147

1 Answers1

2

By default, the order which the Scrapy requests are scheduled and sent is not defined. But, you can control it using priority keyword argument:

priority (int) – the priority of this request (defaults to 0). The priority is used by the scheduler to define the order used to process requests. Requests with a higher priority value will execute earlier. Negative values are allowed in order to indicate relatively low-priority.


You can also make the crawling synchronous by passing around the callstack inside the meta dictionary, as an example see this answer.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195