Can't parse data from `th` tag along with `td` tag from different tables

Question

I've written a script in python using xpath to parse tabular data from a webpage. Upon execution, it is able to parse the data from tables flawlessly. The only thing that I can't fix is parse the table header that means th tag. If I would do the same using css selector, i could have used .cssselect("th,td") but in case of xpath I got stuck. Any help as to how I could parse the data from th tag also will be highly appreciated.

Here is the script which is able to fetch everything from different tables except for the data within th tag:

import requests
from lxml.html import fromstring

response = requests.get("https://fantasy.premierleague.com/player-list/")
tree = fromstring(response.text)
for row in tree.xpath("//*[@class='ism-table']//tr"):
    tab_d = row.xpath('.//td/text()')
    print(tab_d)

What is desired output? Do you want to get th nodes along with td from each tr? — Andersson, Dec 23 '17 at 22:27
Apology in advance to both of the xpath giants who care to provide me with excellent solutions. It's hard to choose a solution over the other. However, I'm considering as my selected answer the one I've got first. — SIM, Dec 24 '17 at 06:05

score 1 · Accepted Answer · answered Dec 23 '17 at 22:34

1

I'm not sure I get your point, but if you want to fetch both th and td nodes with single XPath, you can try to replace

tab_d = row.xpath('.//td/text()')

with

tab_d = row.xpath('.//*[name()=("th" or "td")]/text()')

answered Dec 23 '17 at 22:34

Andersson

51,635
17
77
129

A little clarity would be much appreciated on how the term `name`appear within the xpath out of nowhere @ sir Andersson. – SIM Dec 24 '17 at 06:09
[`name()` function](https://developer.mozilla.org/en-US/docs/Web/XPath/Functions/name) (or [`local-name()`](https://stackoverflow.com/questions/2462248/what-is-the-difference-between-name-and-local-name)) might be used to check string representation of node name – Andersson Dec 24 '17 at 08:48

kjhughes · Answer 2 · 2017-12-24T00:30:20.070

1

Change

.//td/text()

to

.//*[self::td or self::th]/text()

to include th elements too.

Note that it would be reasonable to assume that both td and th are immediate children of the tr context node, so you might further simplify your XPath to this:

*[self::td or self::th]/text()

edited Dec 24 '17 at 00:30

answered Dec 24 '17 at 00:23

kjhughes

106,133
27
181
240

Thanks sir kjhughes for your solution. It did the trick flawlessly as well. This is the first time I came across this term `self` within any xpath. I'm not sure I'll understand this style myself. +1 for your effective solution. – SIM Dec 24 '17 at 06:13

Can't parse data from `th` tag along with `td` tag from different tables

2 Answers2