1

I need some advice and possible code examples for parsing an HTML table from a website. I'm using the webclient class to download the html from an address. I then need to find the table I want the data from. So for example if the table id is <table id="cia_list", I want to loop through the <td> tags and get just the text inside them. What would be the best way to approach this?

btlog
  • 4,760
  • 2
  • 29
  • 38
broke
  • 8,032
  • 16
  • 54
  • 83

1 Answers1

4

In the past I have converted the HTML to XML and then used XSLT to parse the results. If this is an approach you want to take I would recommend looking at SGMLReader, which will handle the conversion.

People will often attempt to use regex to do what you are talking about. This is something I typically advise against. Here is an amusing post that goes over some of the reasons not to do this:

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486