0

I'm trying to scrape store locations into a csv using scrapy. I'm capturing the right data, but the output looks like this (with "name" field as an example)

csv output

Code:

import scrapy

from xx.items import xxItem

class QuotesSpider(scrapy.Spider):
    name = 'xx_spider'
    allowed_domains = ['www.my.xx.com']
    start_urls = [
                'https://my.xx.com/storefinder/list/a',
                ]

    def parse(self, response):  
        rows = response.css('div.col-md-4.col-sm-6')
        for row in rows:
            item = xxItem()  
            item['name'] = rows.css('h3::text').extract()
            item['address'] = rows.css('p::text').extract() 

        return item

1 Answers1

0

A return statement is used to end the execution of the function call and “returns” the result (value of the expression following the return keyword) to the caller.

Reference link.

Hence, when you use return keyword, your code execution stops. Instead, you need to use yield keyword.

What does the “yield” keyword do?

Solution:

Replace statement return item with yield item and move it to the for loop scope:

Code with changes:

import scrapy

from xx.items import xxItem

class QuotesSpider(scrapy.Spider):
    name = 'xx_spider'
    allowed_domains = ['www.my.xx.com']
    start_urls = [
                'https://my.xx.com/storefinder/list/a',
                ]

    def parse(self, response):  
        rows = response.css('div.col-md-4.col-sm-6')
        for row in rows:
            item = xxItem()  
            item['name'] = row.css('h3::text').extract()
            item['address'] = row.css('p::text').extract() 

            yield item

To store data in csv file, run your spider using command:

scrapy crawl xx_spider -o output_file.csv

Hope it helps :)

s.k
  • 193
  • 1
  • 2
  • 15