I am using Scrapy 1.2 with Xpath (and of course: python 3.4) to read the Hot 100 chart on billboard.com. I get all 100 titles for each song when I use the second option in the code. I get that's because of the double /; but I cannot make the first option work. How can I make sure that I get only the right title for each song?
class MusicalSpider(scrapy.Spider):
name = "musicalspider"
allowed_domains = ["billboard.com"]
start_urls = ['http://www.billboard.com/charts/hot-100/']
def parse(self, response):
songs = response.xpath('//div[@class="chart-data js-chart-data"]/div[@class="container"]/article')
for song in songs:
item = MusicItem()
# first option:
item['title'] = song.xpath('div[@class="chart-row__primary"]/div[@class="chart-row__main-display"]/div[@class="chart-row__container"]/div[@class="chart-row__title"]/h2[@class="chart-row__song"]').extract()
# second option:
item['title'] = song.xpath('//h2[@class="chart-row__song"]').extract()
yield item