I am trying to get started parsing html with lxml. I know from basic xpath that /
should select the root node, //body
should select the body element node wherever it is in the dom, etc. However I am getting an empty list for all of them.
from lxml import html
import urllib2
headers = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0'}
req = urllib2.Request("http://news.ycombinator.com", None, headers)
r = urllib2.urlopen(req).read()
x = html.fromstring(r)
x.xpath("/")
[]
EDIT:
For example, here is another valid xpath expression for that page which returns an empty list
x.xpath("/html/body/center/table/tbody/tr[3]/td/table/tbody/tr[1]/td[3]")
[]
# when it should have returned the following (as of this time)
# <td class="title"><a href="http://www.tomdalling.com/blog/modern-opengl/opengl-in-2014/">OpenGL in 2014</a><span class="comhead"> (tomdalling.com) </span></td>