1

I've written a script in python scrapy to scrape name and price from a webpage and write it to a csv file. The script is running flawlessly.

However, when the crawling is done, I could notice that in csv file the results are having a uniform gap between two lines that means there is a line gap between each two rows.

At this point I tried to write few lines in the spider class to get speckless output and now the csv output I'm having doesn't have any line gap.

My question is: did I do things accordingly? Because, i did not establish any relation between "items.py" and "sephsp.py" and yet I'm getting results. Is there any supervision of "items.py" file over "sephsp.py" files? Finally, in my spider class I used writer as "global" to get entrance in "target_page" method to write the two fields in a csv file. Thanks in advance.

This is what I was getting my csv output: Click to see .This is fixed now if I consider my below script is accurate.

Here is the script I tried with:

"items.py" includes:

import scrapy
class SephoraItem(scrapy.Item):
    name = scrapy.Field()       # I couldn't find any way to make a bridge between this name and the name in spider class
    price = scrapy.Field()

Spider file contains:

import scrapy
import csv

outfile = open("Sephora.csv","w",newline='')
writer = csv.writer(outfile)

class SephoraSpider(scrapy.Spider):
    name = "sephorasp"
    start_urls = ["https://www.sephora.ae/en/stores/"]

    def parse(self, response):
        for link in response.css('ul.nav-primary a.level0::attr(href)').extract():
            yield scrapy.Request(url=link, callback=self.parse_inner_pages)

    def parse_inner_pages(self, response):
        for link in response.css('li.amshopby-cat > a::attr(href)').extract():
            yield scrapy.Request(url=link, callback=self.target_page)

    def target_page(self, response):
        global writer                                   # Here I've used writer as global
        for titles in response.css('div.product-info'):
            name = titles.css('.product-name > a::text').extract_first()
            price = titles.css('span.price::text').extract_first()
            yield {'name': name, 'price': price}  #This line is for bringing the clarity that I've got no issues with printing results
            writer.writerow([name,price])

One last thing: if I do not wish to declare "writer" as global, what could be the alternative for writer to penetrate the "target_page()" method and write the two fields?

SIM
  • 21,997
  • 5
  • 37
  • 109
  • If you're just creating a CSV from this - why not use the available CSV export feed and yield the items from target_page? – Jon Clements Sep 16 '17 at 14:28
  • Something like `for titles in ...: yield {'name': name, 'price': price}` then run your spider as `scrapy crawl sephorasp -o Sephora.csv` ? And as to the blank lines, maybe a simple check that you've got a name and price before yielding? (or maybe you need to take every other product-info etc...) – Jon Clements Sep 16 '17 at 14:29
  • Thanks sir Jon Clements, for your response. I'm giving it a go and get back to you when I'm done. – SIM Sep 16 '17 at 14:31
  • @Jon Clements, I rechecked your response and found that you got me wrong. Perhaps I could not make myself clear. I've no issues with printed results; rather, it is the csv output which is awkward. In python 3 it is a common problem when it comes to work with scrapy. – SIM Sep 16 '17 at 14:39
  • @Topto just check you've got both values before you return them in the item pipeline/write to csv file then... (or fix your `.css` so that it only retrieves those elements to start with) – Jon Clements Sep 16 '17 at 14:41
  • What exactly is the question? You say "This is what I **was** getting my csv output" and "This is fixed now...". `newline=''` is the correct way to open file for use with the `csv` module [[ref](https://stackoverflow.com/a/3348664/235698)] to prevent the blank lines you see opening the .csv file in Excel. – Mark Tolonen Sep 16 '17 at 14:53
  • @ Mark Tolonen, I've solved it in my way but I'm dubious about the way i did it. Is it acceptable? – SIM Sep 16 '17 at 15:05

0 Answers0