XPATH how to extract one td at a time from a tbody in HTML using HTML agility pack

Question

I am trying to parse the table from the URL (Google finance) below

http://www.google.com/finance/historical?q=BOM:533278

I am trying to extract only the close values in the close column. But when i try with the XPATH

hd.DocumentNode.SelectSingleNode("//td[@class='rgt']")

I am getting all the nodes of having attribute as class and value of attribute as rgt in one Node.innerText itself.

I need the values one by one, not all at the same time. I must be doing something silly here. Thank you.

Actual XPath found using Firebug is a follows

/html/body/div/div/div[3]/div[2]/div/div[2]
     /div[2]/div/form/div[2]/table/tbody/tr[2]/td[5]

But some how after the form tag...HTMLagility pack is returning null node. Never thought this would take so long to implement.

This is a **FAQ: browsers add mandatory elements to DOM** (as `head` and `tbody`). Your input source doesn't have any `tbody` element. — , Mar 06 '11 at 21:12

score 4 · Accepted Answer · edited May 23 '17 at 12:22

If you're using Firebug or any Firefox extension (like XPather) to obtain the XPath of the elements you need to parse, you might need to remove the tbody tags from the XPath.

Take a look at the following answer here on SO: Why does firebug add <tbody> to <table>?

If you're using HtmlAgilityPack, the XPath returned by Firebug or by any other tool related with Firefox may differ, because the HTML source you're parsing can be different from the HTML source in Firefox.

Sometimes might be useful to open the same page in Internet Explorer 8 and using Developer Tools (F12) do the same you're doing with Firebug, or if not, use another tool like HAP Explorer that can be downloaded from the HtmlAgilityPack page

score 1 · Answer 2 · edited Mar 11 '11 at 05:40

1

There are many ways to do it. Here is one solution, which is based on the Data td (the one withe the 'lm' class):

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
... load the doc ...

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//td[@class='lm']/../td[5]"))
{
    Console.WriteLine("node=" + node.InnerText);
}

edited Mar 11 '11 at 05:40

Oscar Mederos

29,016
22
84
124

answered Mar 06 '11 at 16:54

Simon Mourier

132,049
21
248
298

score 0 · Answer 3 · answered Mar 05 '11 at 11:55

0

XPath for the first cell in Close column is //div[@id='prices']/table/tbody/tr[2]/td[5] and for the second one it's //div[@id='prices']/table/tbody/tr[3]/td[5] and so on.

answered Mar 05 '11 at 11:55

Harri

2,692
2
21
25

HtmlNode node = hd.DocumentNode.SelectSingleNode("//div[@id='prices']/table/tbody/tr[3]/td[5]"); - This is not working...i am still getting a null reference – Krishna Chaitanya M Mar 06 '11 at 04:41

XPATH how to extract one td at a time from a tbody in HTML using HTML agility pack

3 Answers3