51

I have an XML file that's structured like this:

 <foo>
     <bar></bar>
     <bar></bar>
     ...
</foo>

I don't know how to grab a range of nodes. Could someone give me an example of an XPath expression that grabs bar nodes 100-200?

Jules
  • 1,677
  • 1
  • 19
  • 25
Shawn
  • 7,235
  • 6
  • 33
  • 45

4 Answers4

91

Use:

/*/bar[position() >= 100 and not(position() > 200)]

Do note:

  1. Exactly the bar elements at position 100 to 200 (inclusive) are selected.

  2. The evaluation of this XPath expressions can be many times faster than an expression using the // abbreviation, because the latter causes a complete scan of the tree whose root is the context node. Always try to avoid using the // abbreviation in cases when this is possible.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • If you remember, could you go into your second point? Will // and /*/ return the same results, and yet still the latter is faster? – Bram Vanroy Apr 27 '16 at 13:31
  • @BramVanroy, Both `//` and `/*/` are syntactically invalid XPath expressions. I am guessing that you meant `//*` and /*/* . The answer is that in this specific case the time taken should be approximately the same. However, if predicates are involved, the expression including `//` will have to scan a whole tree (even the descendents of the finally selected nodes) and filter each node -- while in the case of the exact expression such scan is avoided. Do note also, that there are **some** XPath processors that are highly optimized to process `//*` and similar expressions efficiently. – Dimitre Novatchev Apr 27 '16 at 14:00
10
//foo/bar[100 <= position() and position() < 200]
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
9

Isn't fn:subsequence the best way?

subsequence( /foo/bar, 100, 101 )

returns all items from position 100 through 200, that is 101 items (or less if the source sequence is shorter).

CiaPan
  • 9,381
  • 2
  • 21
  • 35
  • While this may be correct, the use of position() in a predicate is more general since it can be applied at multiple levels. E.g.: /foo/bar[2 <= position() and position() < 5]/x[5 <= position() and position() < 10] – Donald Rich Feb 22 '18 at 13:18
  • @DonaldRich Yes – but that causes `position()` to be evaluated at every item in the sequence (which may be much more than those 100 items finally selected), while `subsequence()` is called just once. Anyway, this talk is about 'how can we select all nodes in the sequence at positions 100 through 200', and **not** about 'can we avoid using `position()` for every possible sequence filtering requirement?' My answer relates directly to — and just to — the question asked. – CiaPan Feb 22 '18 at 15:56
  • 1
    @DonaldRich BTW, provided we _have_ `subsequence()` available, your query can be easily rewritten as `subsequence( subsequence( /foo/bar, 2, 3)/x, 5, 5)` – CiaPan Feb 22 '18 at 15:56
  • 1
    CiaPan: the function `subsequence()` is defined in the W3c standard specification : "Functions and Operators" only from (XPath) ver. 2.0 and later. The OP is not probably using XPath 2.0, thus the solutions that use `position()` are more general -- work also in XPath 1.0 and using them gives us generality and universality so that we don't care about which version of XPath our environment is supporting. – Dimitre Novatchev Nov 10 '19 at 04:49
1

To select range, you must use position(), and use the clause 'and'. I'm going write two ways:

//foo//bar[position() >= 100 and position() <= 200]

or

//foo//bar[position() >= 100][position() <= 200]