There's an important note in W3C document on XPath 1.0 (W3C Recommendation 16 November 1999):
XML Path Language (XPath) Version 1.0
2 Location Paths
2.5 Abbreviated Syntax
NOTE: The location path //para[1]
does not mean the same as the location path /descendant::para[1]
. The latter selects the first descendant para
element; the former selects all descendant para
elements that are the first para
children of their parents.
Simlar note in the document on XPath 3.1 (W3C Recommendation 21 March 2017)
XML Path Language (XPath) 3.1
3 Expressions
3.3 Path Expressions
3.3.5 Abbreviated Syntax
NOTE: The path expression //para[1]
does not mean the same as the path expression /descendant::para[1]
. The latter selects the first descendant para
element; the former selects all descendant para
elements that are the first para
children of their respective parents.
That means the double slash inside the path is not just a shortcut for /descendant-or-self::node()/
but also a starting point for next level of an XML tree iteration, which implies the step expression to the right of //
is re-run on each descendant of the current context node.
So the exact meaning of the predicate in this path
//div[ descendant::table/descendant::td[4] ]
is:
- build a sequence of all
<table>
nodes descendant to the current <div>
,
- for every such
<table>
build a sequence of all descendant <td>
elements and concatenate them into a single sequence,
- filter that sequence for its fourth item.
Finally the path returns all <div>
elements in the document, which have at least four data cells in all their nested tables. And since there are tables in the document which have 4 cells or more (including cells in nested tables, of course), the whole expression selects their respective <div>
ancestors.
On the other hand the predicate in
//div[ //table//td[4] ]
means:
- scan the whole document tree for
<table>
elements (more precisely, test the root node and every root's descendant if it has a <table>
child),
- for every table found scan its subtree for elements having a fourth
<td>
subelement (i.e. test if the table or any of its descendants has at least four <td>
children).
Please note the predicate subexpression does not depend on the context node. It is a global path, resolving to some sequence of nodes (possibly empty), thus the predicate boolean value depends only on the document's structure. If it is true the whole path returns a sequence of all <div>
elements in the document, else the empty sequence.
Finally the predicate would be true iff there was an element in any table, having 4 (at least) data cells.
And as far as I can see all <tr>
rows contain two or three cells - there is no element with 4 or more <td>
children, so the predicate subexpression returns en empty sequence, the predicate is false and the whole path gets filtered out. Result is: nothing (empty sequence).