0

quick one. I'm new to using lxml and have spent quite a while trying to scrape text data from a particular site. The element structure is as shown below:

http://tinypic.com/r/2iw7zaa/8

What i want to do is extract the 100,100 that is shown within the highlighted area. The statements i've tried include (I saved the source of the site into a text file to test, test.txt - tried also with html extension):

from lxml import html
tree = html.parse(test.txt)
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]')
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]/text()')

All i seem to get as a result is an empty list [] ,any help would be greatly appreciated.

ps i commented out the two value statements as I'm showing what i tried. I tried a bunch of other xpath statements similiar to the ones above but they were lost as the python shell crashed on me.

pps. apologies for the link to the pic - due to rep I can't post the pic directly.

Sighonide
  • 600
  • 4
  • 15
  • possible duplicate of [Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?](http://stackoverflow.com/questions/18241029/why-does-my-xpath-query-scraping-html-tables-only-work-in-firebug-but-not-the) – Jens Erat Oct 12 '14 at 17:36

1 Answers1

1

Try removing '/tbody' from the xpath.

The browser might be adding the `/tbody' tag whereas it might not appear in the raw HTML.

Read more here and here.

Community
  • 1
  • 1
chishaku
  • 4,577
  • 3
  • 25
  • 33