So I'm trying to scrape the schedule at this page.. http://stats.swehockey.se/ScheduleAndResults/Schedule/3940
..with this code.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class SchemaSpider(BaseSpider):
name = "schema"
allowed_domains = ["http://stats.swehockey.se/"]
start_urls = [
"http://stats.swehockey.se/ScheduleAndResults/Schedule/3940"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
rows = hxs.select('//table[@class="tblContent"]/tbody/tr')
for row in rows:
date = row.select('/td[1]/div/span/text()').extract()
teams = row.select('/td[2]/text()').extract()
print date, teams
But I can't get it to work. What am I doing wrong? I've been trying to figure out myself for a couple of hours now but I have no idea why my XPath doesn't work properly.