4

I have a HTML table (well I didn't make it but I am using it, just to clear that up) with many rows and a few columns.

I want to get some of the data into a string to use as a tooltip. The way I am doing it now is reading the contents of the HTML file as a string and using string manipulation to get the data I want.

This is probably a very bad idea, so I was wondering if there is any API I could use to read text from a specific row and column in a HTML file (like row 2 column 2). I would prefer not using an external .dll library file but I'll have to use it if there is no other way.

Any ideas?

Bobby
  • 11,419
  • 5
  • 44
  • 69
dnclem
  • 2,818
  • 15
  • 46
  • 64

3 Answers3

6

HTML Agility Pack

There are some good examples of how to use the HTML Agility Pack.

Refer links posted by rtpHarry in this answer

An example from the codeplex site as to how you would fix all hrefs in an HTML file using the HTML agility pack:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");
Community
  • 1
  • 1
Jagmag
  • 10,283
  • 1
  • 34
  • 58
2

One of the way could be to use library such as Html Agility Pack to load the html document and then use DOM api or xpath to navigate to required node and get the content. This may get started you on agility pack: How to use HTML Agility pack

Lastly, if your html is xhtml (or in valid xml form) then you may use xml libraries available in .NET itself to do the manipulation.

Community
  • 1
  • 1
VinayC
  • 47,395
  • 5
  • 59
  • 72
0

Actually, I think the approach you're taken is a fine idea.

That's probably how I'd do it. There might be libraries to do it, but they'd just be doing the same thing.

It would be better to get the data from the source rather than parsing it from an HTML page. But if that's all you have, that's what you need to do.

Why do you think it's a bad idea?

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
  • 1
    jonathon, i think it's a bad idea from the perspective of having to have heaps and heaps of exception testing due to incorrect formatting/unexpected characters etc, etc. a one line reference to a library (such as agility pack) goes a long long way. plus there are tons of examples on complex usage of agility pack on the web. as an aside, i recently had to 'filch' some accommodation details from a tourist website for a client recently (i wont go into details - but the client was a former partner of the business. i'm only the messenger :-)). ... cont-> – jim tollan Dec 15 '10 at 10:22
  • this involved querying the paged data, drilling down to the detail for each entry, grabbing the core data, then repeating for each page of paged data. had this been done without the agility pack, i'd have had to determine the inconsistancies for a wide range of scenarios, whereas all that was required was an firm idea of the required structure. just my 2 cent support for a library option. – jim tollan Dec 15 '10 at 10:25
  • jim: Seems like I've been writing a lot of HTML parsing code lately. If it's done well, there's no heaps and heaps of exception testing. At any rate, any testing that is necessary would still be necessary regardless of whether you use someone else's tool or write it yourself. If I write it myself, it tends to be much lighter weight and does exactly what I want. – Jonathan Wood Dec 15 '10 at 15:31