What's the logic behind yielding scrapy.http.Request?

Question

The code that I can't get to understand is from here:

def parse_page1(self, response):
    item = MyItem()
    item['main_url'] = response.url
    request = scrapy.Request("http://www.example.com/some_page.html",
                             callback=self.parse_page2)
    request.meta['item'] = item
    yield request

def parse_page2(self, response):
    item = response.meta['item']
    item['other_url'] = response.url
    yield item

From one of the stackoverflow answers I can get a basic idea at what time the lines around yield keyword are executed. But the code above is too difficult for me because of its seemingly nested yield.

Can you explain the interaction between the two yield and the callback mechanism? Specifically, how are these lines triggered to execute?

Thanks.

score 0 · Accepted Answer · answered Jun 07 '18 at 09:00

0

The yield is well described here. Actually, in this case no need for yield. You can easy replace it by return and will get the same result. It's because yield should be used when you need to iterate over list of some elements. For example to parse all links from the page you can do following loop:

for link in response.xpath('//a/@href'):
    yeild Request(link, callback=self.parse2)

and your parse2 method would be called for each link. Yielding is about returning multiple request or items in scrapy. There's no rocket science here.

answered Jun 07 '18 at 09:00

Danil

4,781
1
35
50

So `Request` is designed to yield its callback when itself is yielded? – lin Jun 07 '18 at 09:07
@lin yes, in my example callback will be called for each iteration. Each `parse2` call will contain their own response object – Danil Jun 07 '18 at 09:09

What's the logic behind yielding scrapy.http.Request?

1 Answers1