Get second element text with XPath?

Question

<span class='python'>
  <a>google</a>
  <a>chrome</a>
</span>

I want to get chrome and have it working like this already.

q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0

I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

And the actual, not simplified, HTML is like this.

<span class='python'>
  <span>
    <span>
      <img></img>
      <a>google</a>
    </span>
    <a>chrome</a>
  </span>
</span>

Your expression `.//span[@class="python"]//a[2]` works for me. — Ken Bloom, Nov 07 '10 at 13:42
Hmmm it seems I have a mistake somewhere, or the simplification of the actual HTML I posted is _too_ simple. I'll try and then modify the question. — , Nov 07 '10 at 13:47
@pdnsk: Good question, +1. See my answer for an explanation and for a simple solution. :) — Dimitre Novatchev, Nov 07 '10 at 15:37
so glad you posted this question. Been trying to figure out a similar problem for about a day. — Fractal, Jun 19 '19 at 14:58

score 42 · Accepted Answer · edited Mar 09 '14 at 18:51

42

I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the // abbreviation.

.//a[2] means: Select all a descendents of the current node that are the second a child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

To put it more simply, the [] operator has higher precedence than //.

If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

(.//a)[2]

This really selects the second a descendent of the current node.

For the actual expression used in the question, change it to:

(.//span[@class="python"]//a)[2]

or change it to:

(.//span[@class="python"]//a)[2]/text()

edited Mar 09 '14 at 18:51

answered Nov 07 '10 at 15:37

Dimitre Novatchev

240,661
26
293
431

Thank you for the explanation, but I have one question, or actually two. If there is only one matching element, will `[2]` throw an exception or return `None`? And do you know why this works with `xpath` but not `findtext`? – Nov 07 '10 at 15:51
1

@pdnsk: My answer is pure XPath. I don't know Python. – Dimitre Novatchev Nov 07 '10 at 16:13
I tried and it just returns no element, which is good because one reason why I wanted to avoid lists and have it in a single expression is to not have an additional check. – Nov 07 '10 at 16:30
1

Been trying to figure out a similar answer for a full day. Thanks a ton for the help! – Fractal Jun 19 '19 at 14:58

score 2 · Answer 2 · answered Nov 07 '10 at 13:56

2

I'm not sure what the problem is...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

answered Nov 07 '10 at 13:56

MattH

37,273
11
82
84

score 2 · Answer 3 · 2010-11-07T14:29:33.467

2

From Comments:

or the simplification of the actual HTML I posted is too simple

You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second a child (fn:position() refers to the child axe). So, nothing will be select if your document is like:

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span>

If you want the second of all descendants, use:

descendant::span[@class="python"]/descendant::a[2]

edited Nov 07 '10 at 14:29

answered Nov 07 '10 at 14:10

It works with `xpath` but not with `findtext`, and returns a list with one item. – Nov 07 '10 at 14:40
@pdknsk: That's because this XPath expression return a node set result: it could be empty, it could be a singleton, it could be many spans with a "python" class an a second descendant... If you want the **string value** of the first of this results, use `string()` function with this expression as argument. I don't know what kind of data type can return your `xpath` method... – Nov 07 '10 at 14:49
It works. I used a combination of the previous answer, with `/text()`, and this answer, but I'll accept this answer because it details the problem. I only have one question. What is the short equivalent to `/descandant::`? – Nov 07 '10 at 15:14
@pdknsk: First, `text()` will return all the text node children. `string()` or the DOM method for string value will return the concatenation of all descendant text nodes. **It's not the same**. Second, there is no abbreviated form for `descendant` axe. My last expression it's equivalent to `(.//span[@class="python"]//a)[2]?` so the `position()` predicate gets applied to the whole expression not just last step. – Nov 07 '10 at 20:25

Get second element text with XPath?

3 Answers3

Linked