2

Lets assume I get an HTML table as a string, and I want - using C# - to dissect it to its elements (td, div - if any, etc.) and obtain each of their attributes, such as 'style', 'class'...

My goal is to eventually get an HTML table and build a table object of my own out of it, retaining most (if not all) of the table's attributes. Now, the only way I can think of doing this, seems to me like a coding-nightmare: dissect the string to each of its 'tr' and 'td' and start digging in, looking for each of those element's attribute and try to parse it to something I can work with - is there any other way?

Example:

string someString = "<div><table cellpadding="0" cellspacing="0"><tr><td style="border-bottom:1px solid transparent;width:1px;font-size:1px;height:1px;line-height:1px;"><div class="someClass">..."

will become (in my hypothetical object):

MyTable table = new MyTable
{
   CellPadding = "0",
   ...
}

MyTableRow row = new MyTableRow 
{
   Cell[0].Style.BorderBottom = "1px solid transparent",
   Cell[0].Style.Width = "1px",
   ...
}

you get the idea :)

ShayLivyatan
  • 103
  • 7

2 Answers2

3

There's a library called HtmlAgilityPack to parse HTML documents and give you access to the DOM in C# code.

Marcin Hoppe
  • 541
  • 2
  • 10
  • 1
    Thanks, it seems to be the best way. I've been googling for a solution for this for quite some time but I didn't stumble upon the **HtmlAgilityPack** for some odd reason. Maybe I wasn't asking Google the right question? Anyhows, equipped with the **HtmlAgilityPack** I've found a good way to do what I wanted to do, in Marc's answer here: http://stackoverflow.com/questions/655603/html-agility-pack-parsing-tables – ShayLivyatan Mar 22 '12 at 10:18
  • I wasn't aware of this component before this posting, so thanks for bringing it to my attention. – deadlyvices Mar 22 '12 at 10:25
0

I would also suggest you have a look at SGMLReader, which is a drop-in replacement for an XMLReader but handles badly-formed HTML.

deadlyvices
  • 873
  • 6
  • 18
  • Hmm... looks like another nice way to achieve what I wanted... BTW, that link you provided leads to an old version of SGMLReader. You can find the newer versions here: http://developer.mindtouch.com/SgmlReader – ShayLivyatan Mar 22 '12 at 10:24
  • It's 'horses for courses'. If you find my comment useful then a +1 is always appreciated :-) Beware of the newer version: it's GPL, and this comes with a lot of legal baggage. – deadlyvices Mar 22 '12 at 10:27
  • Of course I've found it useful, but unfortunately I cannot vote up with my humble rank... And thanks for the GPL heads-up, I didn't notice that as I was mainly looking in SGMLReader's code examples :) – ShayLivyatan Mar 22 '12 at 10:50