Use C# to grab text from an HTML table

Question

I need some advice and possible code examples for parsing an HTML table from a website. I'm using the webclient class to download the html from an address. I then need to find the table I want the data from. So for example if the table id is <table id="cia_list", I want to loop through the <td> tags and get just the text inside them. What would be the best way to approach this?

http://htmlagilitypack.codeplex.com/ – SLaks Feb 29 '12 at 17:05 — SLaks, Feb 29 '12 at 17:05

score 4 · Accepted Answer · edited May 23 '17 at 11:48

4

In the past I have converted the HTML to XML and then used XSLT to parse the results. If this is an approach you want to take I would recommend looking at SGMLReader, which will handle the conversion.

People will often attempt to use regex to do what you are talking about. This is something I typically advise against. Here is an amusing post that goes over some of the reasons not to do this:

RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 11:48

Community

1
1

answered Feb 29 '12 at 17:09

Abe Miessler

82,532
99
305
486

Thank you for the link about not using regex. I had considered that approach as a viable option. – broke Feb 29 '12 at 17:15
Many people do. Unfortunately, HTML does not cooperate. – Abe Miessler Feb 29 '12 at 17:35
SGMLReader is pretty awesome. Thank you – broke Feb 29 '12 at 20:03
Yeah, it's worked well for me in the past. The link that @Slaks pointed out looked promising too, so you might check that out also. – Abe Miessler Feb 29 '12 at 20:04

Use C# to grab text from an HTML table

1 Answers1