0

i want to parse this Xpath query with lxml in python.

.//*[@id='content_top']/article/div/table/tbody/tr[5]/td/p/text()

I checked the xpath query in Firepath (the firebug extension for xpath),and it works,but my python code show me nothing. Here's the source.

from lxml import html
import requests

page = requests.get("http://www.scienzeetecnologie.uniparthenope.it/avvisi.html")
tree = html.fromstring(page.text)
avvisi = tree.xpath(".//*[@id='content_top']/article/div/table/tbody/tr[5]/td/p/text()")
print(avvisi)

The output is a "[]".

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
cdm
  • 129
  • 1
  • 4
  • 10

1 Answers1

1

There is no actual <tbody> element in the source html, its just an element in the DOM added by the HTML parser.

The firebug actually displays the DOM (and I am guessing firepath , which is a firebug extension works on this DOM (rather than the source html)).

For a more detailed explanation on <tbody> and why firebug displays it , check the answers to the SO question - Why does firebug add <tbody> to <table>? or this question - Why do browsers insert tbody element into table elements?


In your case, removing the <tbody> from the xpath, would make it work , Example -

avvisi = tree.xpath(".//*[@id='content_top']/article/div/table/tr[5]/td/p/text()")
Community
  • 1
  • 1
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176