I need some advice and possible code examples for parsing an HTML table from a website. I'm using the webclient class to download the html from an address. I then need to find the table I want the data from. So for example if the table id is <table id="cia_list"
, I want to loop through the <td>
tags and get just the text inside them. What would be the best way to approach this?
Asked
Active
Viewed 526 times
1
-
7http://htmlagilitypack.codeplex.com/ – SLaks Feb 29 '12 at 17:05
1 Answers
4
In the past I have converted the HTML to XML and then used XSLT to parse the results. If this is an approach you want to take I would recommend looking at SGMLReader, which will handle the conversion.
People will often attempt to use regex to do what you are talking about. This is something I typically advise against. Here is an amusing post that goes over some of the reasons not to do this:

Community
- 1
- 1

Abe Miessler
- 82,532
- 99
- 305
- 486
-
Thank you for the link about not using regex. I had considered that approach as a viable option. – broke Feb 29 '12 at 17:15
-
-
-
Yeah, it's worked well for me in the past. The link that @Slaks pointed out looked promising too, so you might check that out also. – Abe Miessler Feb 29 '12 at 20:04