I'm struggling with Scrapy and I don't understand how exactly passing items between callbacks works. Maybe somebody could help me.
I'm looking into http://doc.scrapy.org/en/latest/topics/request-response.html#passing-additional-data-to-callback-functions
def parse_page1(self, response):
item = MyItem()
item['main_url'] = response.url
request = scrapy.Request("http://www.example.com/some_page.html",
callback=self.parse_page2)
request.meta['item'] = item
return request
def parse_page2(self, response):
item = response.meta['item']
item['other_url'] = response.url
return item
I'm trying to understand flow of actions there, step by step:
[parse_page1]
item = MyItem()
<- object item is createditem['main_url'] = response.url
<- we are assigning value to main_url of object itemrequest = scrapy.Request("http://www.example.com/some_page.html", callback=self.parse_page2)
<- we are requesting a new page and launching parse_page2 to scrap it.
[parse_page2]
item = response.meta['item']
<- I don't understand here. We are creating a new object item or this is the object item created in [parse_page1]? And what response.meta['item'] does mean? We pass to the request in 3 only information like link and callback we didn't add any additional arguments to which we could refer ...item['other_url'] = response.url
<- we are assigning value to other_url of object itemreturn item
<- we are returning item object as a result of request
[parse_page1]
request.meta['item'] = item
<- We are assigning object item to request? But request is finished, callback already returned item in 6 ????return request
<- we are getting results of request, so item from 6, am I right?
I went through all documentation concerning scrapy and request/response/meta but still I don't understand what is happening here in points 4 and 7.