Scrapy Spider - For loop within response callback not iterating

Question

I am trying to use the link parsing structure described by "warwaruk" in this SO thread: Following links, Scrapy web crawler framework

This works great when only grabbing a single item from each page. However, when I try to create a for loop to scrape all items within each page, it appears that the parse_item function terminates upon reaching the first yield statement. I have a custom pipeline setup to handle each item, but currently it only receives one item per page.

Let me know if I need to include more code, or clarification. THANKS!

def parse_item(self,response):  
    hxs = HtmlXPathSelector(response)
    prices = hxs.select("//div[contains(@class, 'item')]/script/text()").extract()
    for prices in prices:
        item = WalmartSampleItem()
        ...
        yield items

score 2 · Answer 1 · answered Apr 18 '14 at 20:48

2

You should yield a single item in the for loop, not items:

for prices in prices:
    item = WalmartSampleItem()
    ...
    yield item

answered Apr 18 '14 at 20:48

alecxe

462,703
120
1,088
1,195

It still seems to have the same issue, I just accidentally added that the s when I pasted the code in. – Tyler Apr 19 '14 at 00:07

Scrapy Spider - For loop within response callback not iterating

1 Answers1