0

I am scraping individual listing pages from justproperty.com (individual listing from the original question no longer active).

I want to get the value of the Ref

this is my xpath:

>>> sel.xpath('normalize-space(.//div[@class="info_div"]/table/tbody/tr/td[norma
lize-space(text())="Ref:"]/following-sibling::td[1]/text())').extract()[0]

This has no results in scrapy, despite working in my browser.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Marco Dinatsoli
  • 10,322
  • 37
  • 139
  • 253
  • By requiring folks who want to test proposed answers to your question to write their own code to actually download the document, you're decreasing the number and quality of answers you're likely to get. It would be helpful if you provided a proper SSCCE for your problem (see http://www.sscce.org/) – Charles Duffy Feb 27 '14 at 17:54
  • @CharlesDuffy I had a problem, i tried to solve it. I couldn't. So, I asked here. the question is so clear – Marco Dinatsoli Feb 27 '14 at 17:58
  • It's clear, but it requires more work for people to test their answers than it would if your reproducer were self-contained. – Charles Duffy Feb 27 '14 at 17:59
  • @alecxe how did you test the xpath in the browser please? would you tell me because I have never done that before. thanks – Marco Dinatsoli Feb 27 '14 at 18:03

3 Answers3

2

The following works perfectly in lxml.html (with modern Scrapy uses):

sel.xpath('.//div[@class="info_div"]//td[text()="Ref:"]/following-sibling::td[1]/text()')

Note that I'm using // to get between the div and the td, not laying out the explicit path. I'd have to take a closer look at the document to grok why, but the path given in that area was incorrect.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
2

Don't create XPath expression by looking at Firebug or Chrome Dev Tools, they're changing the markup. Remove the /tbody axis step and you'll receive exactly what you're look for.

normalize-space(.//div[@class="info_div"]/table/tr/td[
  normalize-space(text())="Ref:"
]/following-sibling::td[1]/text())

Read Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? for more details.

Community
  • 1
  • 1
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
0

Another XPath that gets the same thing: (.//td[@class='titles']/../td[2])[1]

I tried your XPath using XPath Checker and it works fine.

dparpyani
  • 2,473
  • 2
  • 14
  • 16