Good afternoon, today I got an HTML document to parse.
<!DOCTYPE html>
<html>
<body>
<table name="test" style="width:100%">
<tr>
<th>First name</th>
<th>Last name</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
<tr>
<td>Eve</td>
<td>Jackson</td>
<td>94</td>
</tr>
<tr>
<td>John</td>
<td>Doe</td>
<td>80</td>
</tr>
</table>
</body>
</html>
Basically, it's a very simple table. I know how to parse such document with python and lxml, and I also managed to retrieve most of the informations I need in that kind of document.
Nonetheless, I have some troubles to get the text value of a th element which has the same position()
as a td element.
What I've done so far :
With one xpath like that one I retrieve all of my td elements :
/html/body/table[@name='test']/tr/td
I then apply another xpath to each element to get the right th element.
I'm using something like that :
./ancestor::table/tr/th[position()=count(./preceding-sibling::td)+1]
Nonetheless, this is not working, my count function returns 0. I guess the path I'm giving (./preceding-sibling::td) is refering to th/preceding-sibling::td. Thus, with no td element existing in the same row as th, the count function returns 0. I'd like to refer to the td element I'm querying instead.
Yet I don't know how to do that, and the only good answer I found on that subject (xpath: find table cell with same position in different row) is based on the fact the user knows an identifier for the td to find. I just can't hardcode a td text value in my xpath.
Is there any way to do that with xpath only ?
Thanks for any help you can provide.
EDIT :
Current node vs. Context node in XSLT/XPath?
According to that answer, my xpath is selecting the context node, which in my case is the th element. What I need is to select the current node, which is a td element to which I'm applying that piece of code :
lxmlelement.xpath('./ancestor::table/tr/th[position()=count(./preceding-sibling::td)+1]')