why does this xpath selector fail?

Question

given the following html

<p>
    <div class="allpricing">
      <p class="priceadorn">
          <FONT CLASS="adornmentsText">NOW:&nbsp;</FONT>
          <font CLASS="adornmentsText">$1.00</font>
      </p>
    </div>
</p>

why does

//div[@class="allpricing"]/p[@class="priceadorn"][last()]/font[@class="adornmentsText"][last()]

return the expected value of $1.00

but adding the p element

//p/div[@class="allpricing"]/p[@class="priceadorn"][last()]/font[@class="adornmentsText"][last()]

returns nothing?

score 3 · Accepted Answer · edited May 23 '17 at 12:27

3

You cannot place a div inside a p. The div start closes the p automatically. See Nesting block level elements inside the <p> tag... right or wrong?

edited May 23 '17 at 12:27

Community

1
1

answered Sep 03 '12 at 16:59

choroba

231,213
25
204
289

interesting. In my ignorance I was under the impression that xpath doesn't care about the names of the nodes (ie, just matches with and with , for whatever values of x and y), but this suggests that if the html is invalid, the xpath will not work. Why would xpath care if the html is valid, as long as the number and type of nodes align? – jela Sep 03 '12 at 17:08
XPath works on parsed HTML. If your HTML cannot be parsed, it does not work at all. If it can be parsed, it works - even if you think the structure of the document is different than the parser thinks. – choroba Sep 03 '12 at 17:10
1

XPath works with XML files. I don't think it cares whether you put a
inside a
as long as its in a valid XML structure and tags are ended properly. Correct me if I'm wrong.
– fahad.hasan Sep 03 '12 at 17:16
@jela:: did it solve your problem? I'll learn something new if it did :) – fahad.hasan Sep 03 '12 at 17:19
@ShutterBug: XPath can work with DOM objects. You can build a DOM object both form XML and HTML. – choroba Sep 03 '12 at 17:24
@ShutterBug well the problem is not solved (ie, I can't select via the opening `p` element), but this appears to be a result of the fact that xpath requires valid html. My understanding was the same as yours (any valid XML file would do) but apparently this is not the case. – jela Sep 03 '12 at 17:47

score 0 · Answer 2 · answered Sep 03 '12 at 17:11

0

I've often found that fixing the cases was the culprit. XPath 1.0 is case sensitive and unless you take care of the mixed cases explicitly, it will fail in a lot of cases.

answered Sep 03 '12 at 17:11

fahad.hasan

902
8
16

score 0 · Answer 3 · answered Sep 03 '12 at 17:31

XPath is case-sensitive.

None of the provided XPath expressions selects any node, because in the provided XML document there is no font element with an attribute named class (the element font has a CLASS attribute and this is different from having a class attribute due to the different capitalization).

Due to the same reason, font and FONT are elements with different names.

These two XPath expressions, when evaluated against the provided XML document, produce the same wanted result:

   //div[@class="allpricing"]
       /p[@class="priceadorn"]
                       [last()]
          /font[@CLASS="adornmentsText"]
                               [last()]

and

//p/div[@class="allpricing"]
      /p[@class="priceadorn"]
                        [last()]
         /font[@CLASS="adornmentsText"]
                                   [last()]

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  '//div[@class="allpricing"]
       /p[@class="priceadorn"]
                       [last()]
          /font[@CLASS="adornmentsText"]
                               [last()]'/>
=============
  <xsl:copy-of select=
   '//p/div[@class="allpricing"]
          /p[@class="priceadorn"]
                            [last()]
             /font[@CLASS="adornmentsText"]
                                       [last()]
   '/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<p>
    <div class="allpricing">
      <p class="priceadorn">
          <FONT CLASS="adornmentsText">NOW:&#xA0;</FONT>
          <font CLASS="adornmentsText">$1.00</font>
      </p>
    </div>
</p>

the two expressions are evaluated and the results of this evaluation are copied to the output:

<font CLASS="adornmentsText">$1.00</font>
=============
  <font CLASS="adornmentsText">$1.00</font>

In my PHP script, when I use the first selector (starting with `//div`), I get the correct result, $1.00, but when I use the second selector, I get no result. The only difference between the selectors is the `p` node. This appears to conflict with your results. Is it possible that the xpath implementation I am using is not working properly? Or perhaps it is only intended to be used on valid html? — jela, Sep 03 '12 at 17:53
@jela, Any compliant and non-buggy XPath implementation must produce the same results when the XPath expressions are evaluated on the *same* XML document. I have performed the transformation with all 10 different XSLT processors I have available (each with is own XPath implementation) and they all produce exactly the same, correct result. — Dimitre Novatchev, Sep 03 '12 at 18:01

score 0 · Answer 4 · answered Sep 03 '12 at 21:50

You describe your source as an HTML rather than an XML document, but you haven't explained how you parsed it. If you parse it using an HTML parser, the parser will "repair" it to turn it into valid HTML, which means that the tree it constructs doesn't directly reflect what you wrote in the source. XPath sees this "repaired" tree, not the original.

why does this xpath selector fail?

4 Answers4