Right now I am learning how to use Xpath to scrape websites in combination with python Scrapy. Right now I am stuck at the following:
I am looking at a dutch website http://www.ah.nl/producten/bakkerij/brood where I want to scrape the names of the products:
So eventually I want a csv file with the names of the articles of all these breads. If I inspect elements, I get to see where these names are defined:
I need to find the right XPath to extract "AH Tijgerbrood bruin heel". So what I thought I should do in my spider is the following:
import scrapy
from stack.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "ah"
allowed_domains = ["ah.nl"]
start_urls = ['http://www.ah.nl/producten/bakkerij/brood']
def parse(self, response):
for sel in response.xpath('//div[@class="product__description small-7 medium-12"]'):
item = DmozItem()
item['title'] = sel.xpath('h1/text()').extract()
yield item
Now, if I crawl with this spider, I dont get any result. I have no clue what I am missing here.