xpath doesn't work in this website

Question

I am scraping individual listing pages from justproperty.com (individual listing from the original question no longer active).

I want to get the value of the Ref

this is my xpath:

>>> sel.xpath('normalize-space(.//div[@class="info_div"]/table/tbody/tr/td[norma
lize-space(text())="Ref:"]/following-sibling::td[1]/text())').extract()[0]

This has no results in scrapy, despite working in my browser.

By requiring folks who want to test proposed answers to your question to write their own code to actually download the document, you're decreasing the number and quality of answers you're likely to get. It would be helpful if you provided a proper SSCCE for your problem (see http://www.sscce.org/) — Charles Duffy, Feb 27 '14 at 17:54
@CharlesDuffy I had a problem, i tried to solve it. I couldn't. So, I asked here. the question is so clear — Marco Dinatsoli, Feb 27 '14 at 17:58
It's clear, but it requires more work for people to test their answers than it would if your reproducer were self-contained. — Charles Duffy, Feb 27 '14 at 17:59
@alecxe how did you test the xpath in the browser please? would you tell me because I have never done that before. thanks — Marco Dinatsoli, Feb 27 '14 at 18:03

score 2 · Accepted Answer · answered Feb 27 '14 at 18:03

The following works perfectly in lxml.html (with modern Scrapy uses):

sel.xpath('.//div[@class="info_div"]//td[text()="Ref:"]/following-sibling::td[1]/text()')

Note that I'm using // to get between the div and the td, not laying out the explicit path. I'd have to take a closer look at the document to grok why, but the path given in that area was incorrect.

score 2 · Answer 2 · edited May 23 '17 at 11:43

Don't create XPath expression by looking at Firebug or Chrome Dev Tools, they're changing the markup. Remove the /tbody axis step and you'll receive exactly what you're look for.

normalize-space(.//div[@class="info_div"]/table/tr/td[
  normalize-space(text())="Ref:"
]/following-sibling::td[1]/text())

Read Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? for more details.

score 0 · Answer 3 · answered Feb 27 '14 at 18:07

0

Another XPath that gets the same thing: (.//td[@class='titles']/../td[2])[1]

I tried your XPath using XPath Checker and it works fine.

answered Feb 27 '14 at 18:07

dparpyani

2,473
2
14
16

Works against the browser DOM doesn't mean it works against the original document. – Charles Duffy Jun 14 '15 at 14:18

xpath doesn't work in this website

3 Answers3