XQuery: // vs descendant-or-self::node()

Question

Recently I needed to evaluate an XQuery on the Node of an HTML document. Basically, I needed to select all elements with an href attribute from the first child of the body element. I've added a slight example to explain:

<html>
    <body>
        <a href="http://www.google.be"/>
    </body>
</html>

The desired extraction result is in this case obviously:

<a href="http://www.google.be"/>

My first idea was to use //body/*[1]//*[@href] because:

//body matches the body element, wherever it is
/*[1] matches the first child of the body element
//*[@href] matches all descendants or self of the current element

I figured that would work but on the example provided, the XQuery gives no results.

However, I read up a bit and found the following (source: http://www.keller.com/xslt/8/):

Alternate notation for "//": descendant-or-self::node()

So I changed my XQuery to //body/*[1]/descendant-or-self::node()[@href] and this time, the results were correct.

My question: what is the difference between // and descendant-or-self::node()? What I found here (What's the difference between //node and /descendant::node in xpath?) and here (http://www.w3.org/TR/xpath/#axes) says:

// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para.

Which leads me to conclude that // and /descendant-or-self::node() are not interchangeable (probably because of the terminating / at the end?), but then can someone tell me if there is a shorthand for /descendant-or-self::node()?

`//` is shorthand for `/descendant-or-self::node()/` _including the leading and trailing slashes_, there is no shorthand notation for `descendant-or-self::node()` _without_ the slashes, you'd have to spell it out in full. — Ian Roberts, Jan 20 '14 at 20:53
Thanks Ian, that was concise yet to the point. Long story short: there is no shorthand without the slashes. — RDM, Jan 20 '14 at 23:55
@IanRoberts Moreover, one could stress that the '/' in /descendant-or-self::node() is referring to the very root node of the DOM; so the implementation starts looking at each and every node in the tree. This might be hard to grasp for a beginner. — muenalan, Jan 19 '18 at 10:49

score 5 · Accepted Answer · answered Jan 20 '14 at 18:38

Your first XPath expression (//body/*[1]//*[@href]) actually represents what you described in natrual language: //body/*[1] is the first child of the body element, and //*[@href] selects the first element (below) having an @href attribute.

In your example, there is no element below the anchor tag having such an attribute. Fore xample, this query would match

<html>
    <body>
        <p>
            <a href="http://www.google.be"/>
        </p>
    </body>
</html>

The non-abbreviated version of this query is:

//body/*[1]/descendant-or-self::node()/*[@href]

Putting your second query in contrast, the problem should be easy to see:

//body/*[1]/descendant-or-self::node()[@href]

score 1 · Answer 2 · answered Jan 21 '14 at 00:20

I think the problem is in your description, it does not appear to match your example!

Given the input:

<html>
    <body>
        <a href="http://www.google.be"/>
    </body>
</html>

and the requirements statement:

"all elements with an href attribute from the first child of the body element"

Your XPath formulation of:

//body/*[1]//*[@href]

matches your requirements statement. But, the expected output would be an empty sequence, exactly as you have found... and NOT the output you suggested:

<a href="http://www.google.be"/>

To get the suggested output, your XPath requirements statement would instead perhaps be:

"the first child of the body element with an href attribute", which would lead to the XPath:

//*[@href][parent::body][1]

From your requirements statement and the mismatched example, it is hard to be sure exactly what you meant. So perhaps your requirements statement is:

"the first element in the body with a href attribute"

If that is the case, then I would suggest the XPath:

($input//*[@href][ancestor::body])[1]

Note that the sequence constructor, i.e. the '(' and ')' flattens the descendant sequence(s) to allow you to address each selected descendant in a manner similar to an array.

XQuery: // vs descendant-or-self::node()

2 Answers2