0

Good afternoon, today I got an HTML document to parse.

<!DOCTYPE html>
<html>
<body>

<table name="test" style="width:100%">
  <tr>
    <th>First name</th>
    <th>Last name</th>
    <th>Age</th>
  </tr>
  <tr>
    <td>Jill</td>
    <td>Smith</td>
    <td>50</td>
  </tr>
  <tr>
    <td>Eve</td>
    <td>Jackson</td>
    <td>94</td>
  </tr>
  <tr>
    <td>John</td>
    <td>Doe</td>
    <td>80</td>
  </tr>
</table>

</body>
</html>

Basically, it's a very simple table. I know how to parse such document with python and lxml, and I also managed to retrieve most of the informations I need in that kind of document.

Nonetheless, I have some troubles to get the text value of a th element which has the same position() as a td element.

What I've done so far :

With one xpath like that one I retrieve all of my td elements :

/html/body/table[@name='test']/tr/td

I then apply another xpath to each element to get the right th element.

I'm using something like that :

./ancestor::table/tr/th[position()=count(./preceding-sibling::td)+1]

Nonetheless, this is not working, my count function returns 0. I guess the path I'm giving (./preceding-sibling::td) is refering to th/preceding-sibling::td. Thus, with no td element existing in the same row as th, the count function returns 0. I'd like to refer to the td element I'm querying instead.

Yet I don't know how to do that, and the only good answer I found on that subject (xpath: find table cell with same position in different row) is based on the fact the user knows an identifier for the td to find. I just can't hardcode a td text value in my xpath.

Is there any way to do that with xpath only ?

Thanks for any help you can provide.

EDIT :

Current node vs. Context node in XSLT/XPath?

According to that answer, my xpath is selecting the context node, which in my case is the th element. What I need is to select the current node, which is a td element to which I'm applying that piece of code :

lxmlelement.xpath('./ancestor::table/tr/th[position()=count(./preceding-sibling::td)+1]')
Community
  • 1
  • 1
Kaël
  • 163
  • 1
  • 13
  • What element are you trying to select? – Steve Wellens Jun 28 '16 at 14:08
  • `/html/body/table[@name='test']/tr/td[.="Smith"]/ancestor::table/tr/th[position()=count(/html/body/table[@name='test']/tr/td[.="Smith"]/preceding-sibling::td)+1]` so it works. In your case xpath use second point as a value of the last it (th in the case) – splash58 Jun 28 '16 at 14:08
  • So, you need to get this `count(./preceding-sibling::td)` by separate xpath. Making search in tree steps – splash58 Jun 28 '16 at 14:10
  • @splash58 Firstly, thanks for your reply. In my case I can't give the name of a specific td, since that xpath is to be used to retrieve any th for any td. My HTML file is far larger than that I can't write a separate xpath for each td element. That's why I'm first retrieving all the td with the first xpath, and then applying to each one another xpath. To answer Steve Wellens : With the second xpath i'm trying to select th element. And in a way that they have the same index in their row, as the td element I'm applying the xpath on. – Kaël Jun 28 '16 at 14:20
  • 1
    i see. I wrote about thatr 1st read list of all td you need 2nd make loop with that list and for aech td get count and than use your 2nd xpath with position()= numeric count you have at this point – splash58 Jun 28 '16 at 14:24
  • @splash58 After searching for a few hours, indeed there is no other real solution. I'm really disapointed, there are solutions to do that with XSLT but not with pure XPath. – Kaël Jun 29 '16 at 13:13
  • What language do you use? – splash58 Jun 29 '16 at 13:23
  • Python with lxml library. – Kaël Jun 29 '16 at 13:35
  • @Kaël unfortunatly i know nothing about Python :( – splash58 Jun 29 '16 at 14:09

0 Answers0