This web scraping with Scrapy is a little bit outdated link It seems that Selector XPath has been changed.
When I copy it,I have
def parse(self, response):
questions = Selector(response).xpath('//*[@id="question-header"]/h1/a')
But code from above link
class StackSpider(Spider):
name = "stack"
allowed_domains = ["stackoverflow.com"]
start_urls = [
"http://stackoverflow.com/questions?pagesize=50&sort=newest",
]
def parse(self, response):
questions = Selector(response).xpath('//div[@class="summary"]/h3')
for question in questions:
item = StackItem()
item['title'] = question.xpath(
'a[@class="question-hyperlink"]/text()').extract()[0]
item['url'] = question.xpath(
'a[@class="question-hyperlink"]/@href').extract()[0]
yield item
How to constructor generator with new Selector?
This is
Spring data @transactional not rolling back with SQL Server and after runtimeexception
the SO question we are scraping as an example.
Matthew Daniels suggestions
In [4]: response
Out[4]: <200 https://stackoverflow.com/questions/27624141/spring-data-transactional-not-rolling-back-with-sql-server-and-after-runtimeexc>
In [5]: response.css(".question-hyperlink").xpath("@href").extract_first()
Out[5]: '/questions/27624141/spring-data-transactional-not-rolling-back-with-sql-server-and-after-runtimeexc'
In [6]: response.css(".summary h3")
Out[6]: []
In [7]: response.css("#question-header > h1 > a")
Out[7]: [<Selector xpath="descendant-or-self::*[@id = 'question-header']/h1/a" data='<a href="/questions/27624141/spring-data'>]