1

I wanted to know from where do I access an item or where is it returned when I yield an item in parse function ? See the sample code below

from scrapy import Spider
from scrapy import Selector


import scrapy
from scrapy.item import Item,Field


class StackItem(Item):

    title = Field()
    url = Field()

class StackSpider(Spider):
    name = "stack"
    allowed_domains = ["stackoverflow.com"]
    start_urls = [
        "http://stackoverflow.com/questions?pagesize=50&sort=newest"
    ]

    def parse(self, response):
        questions = Selector(response).xpath('//*[@class="summary"]/h3')
        for question in questions:
            item = StackItem()
            item['title'] = question.xpath(
            'a[@class="question-hyperlink"]/text()').extract()
            item['url'] = question.xpath(
            'a[@class="question-hyperlink"]/@href').extract()
            yield item

I am confused that where is this item returned back to ? And how do I access it later on ? Any help would be appreciated. Thanks

Waqar Joyia
  • 17
  • 1
  • 6
  • Possible duplicate of [What does the yield keyword do in Python?](http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python) – juanpa.arrivillaga Jun 30 '16 at 21:41

1 Answers1

1

The items yielded in a Scrapy callback method are consumed by the Scrapy engine, who forwards that item to the Item Pipelines.

So, if you want to do further actions on your items (such as data validation, database persistence, etc), you have to create an Item Pipeline and configure it in your Scrapy project. Check out an example here and have a look at the Scrapy architecture:

Scrapy architecture

Valdir Stumm Junior
  • 4,568
  • 1
  • 23
  • 31