I've written a script in python scrapy to scrape name and price from a webpage and write it to a csv file. The script is running flawlessly.
However, when the crawling is done, I could notice that in csv file the results are having a uniform gap between two lines that means there is a line gap between each two rows.
At this point I tried to write few lines in the spider class to get speckless output and now the csv output I'm having doesn't have any line gap.
My question is: did I do things accordingly? Because, i did not establish any relation between "items.py" and "sephsp.py" and yet I'm getting results. Is there any supervision of "items.py" file over "sephsp.py" files? Finally, in my spider class I used writer as "global" to get entrance in "target_page" method to write the two fields in a csv file. Thanks in advance.
This is what I was getting my csv output: Click to see .This is fixed now if I consider my below script is accurate.
Here is the script I tried with:
"items.py" includes:
import scrapy
class SephoraItem(scrapy.Item):
name = scrapy.Field() # I couldn't find any way to make a bridge between this name and the name in spider class
price = scrapy.Field()
Spider file contains:
import scrapy
import csv
outfile = open("Sephora.csv","w",newline='')
writer = csv.writer(outfile)
class SephoraSpider(scrapy.Spider):
name = "sephorasp"
start_urls = ["https://www.sephora.ae/en/stores/"]
def parse(self, response):
for link in response.css('ul.nav-primary a.level0::attr(href)').extract():
yield scrapy.Request(url=link, callback=self.parse_inner_pages)
def parse_inner_pages(self, response):
for link in response.css('li.amshopby-cat > a::attr(href)').extract():
yield scrapy.Request(url=link, callback=self.target_page)
def target_page(self, response):
global writer # Here I've used writer as global
for titles in response.css('div.product-info'):
name = titles.css('.product-name > a::text').extract_first()
price = titles.css('span.price::text').extract_first()
yield {'name': name, 'price': price} #This line is for bringing the clarity that I've got no issues with printing results
writer.writerow([name,price])
One last thing: if I do not wish to declare "writer" as global, what could be the alternative for writer to penetrate the "target_page()" method and write the two fields?