181

There is an HTML file (whose contents I do not control) that has several input elements all with the same fixed id attribute of "search_query". The contents of the file can change, but I know that I always want to get the second input element with the id attribute "search_query".

I need an XPath expression to do this. I tried //input[@id="search_query"][2] but that does not work. Here is an example XML string where this query failed:

<div>
  <form>
    <input id="search_query" />
   </form>
</div>

<div>
  <form>
    <input id="search_query" />
  </form>
</div>

<div>
  <form>
    <input id="search_query" />
  </form>
</div>

Keep in mind that that the above is merely an example and the other HTML code can be quite different and the input elements can appear anywhere with no consistent document structure (except that I am guaranteed there will always be at least two input elements with an id attribute of "search_query").

What is the correct XPath expression?

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
rlandster
  • 7,294
  • 14
  • 58
  • 96
  • Good question, +1. See my answer for a complete explanation of the problem and for the wanted solution. – Dimitre Novatchev Oct 24 '10 at 15:43
  • 11
    Minor point: you should never have more than one element with a given ID (and so the HTML in the question is actually invalid). In practice, browsers will let you do it anyway, but if you do you're missing out on the only benefit of using IDs, which is that they signal "I'm unique" (whereas classes are designed to be used for non-unique signifiers). – machineghost Sep 16 '16 at 20:40
  • Not a minor point @machineghost ! It is actually a bug! ID stands for unique identifier! – Eftychia Thomaidou Nov 23 '21 at 10:09

2 Answers2

345

This is a FAQ:

//somexpression[$N]

means "Find every node selected by //somexpression that is the $Nth child of its parent".

What you want is:

(//input[@id="search_query"])[2]

Remember: The [] operator has higher precedence (priority) than the // abbreviation.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 10
    I like this answer. I had not considered a precedence issue (I just assumed simple left-to-right precedence). – rlandster Oct 24 '10 at 16:30
  • 15
    @rlandster: The word "precedence" may be confusing. The unabbreviated form of `//input[@id='search_query'][2]` is: `/descendat-or-self::node()/child::input[attribute::id='search_query'][position()=2]` –  Oct 24 '10 at 20:35
  • 49
    For those who got here from Google - the numbering starts from 1 - [1] being the first element and so on – Jan Mares Dec 07 '18 at 15:45
29

This seems to work:

/descendant::input[@id="search_query"][2]

I go this from "XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition" by Michael Kay.

There is also a note in the "Abbreviated Syntax" section of the XML Path Language specification http://www.w3.org/TR/xpath/#path-abbrev that provided a clue.

rlandster
  • 7,294
  • 14
  • 58
  • 96
  • 1
    Many thanks for this answer. In my case the accepted solution would not work as I'm using the xpath in robot framework, which wouldn't accept paths starting with brackets. This one however, should do the trick – dahui Sep 23 '15 at 10:40
  • When I try this: ${el_my_value}= XML.Get Element ${x} .//isbn – LTL Oct 21 '21 at 09:22
  • It leads to this: Multiple elements (6) matching './/isbn' found. how can I find the 4th? – LTL Oct 21 '21 at 09:23