1

I use C# and need to parse an HTML to read the attributes into key value pairs. e.g given the following HTML snippet

<DIV myAttribute style="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none" id=my_ID anotherAttribNamedDIV class="someclass">

Please note that the attributes can be
1. key="value" pairs e.g class="someclass"
2. key=value pairs e.g id=my_ID (no quotes for values)
3. plain attributes e.g myAttribute, which doesn't have a "value"

I need to store them into a dictionary with key value pairs as follows
key=myAttribute value=""
key=style value="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"
key=id value="my_ID"
key=anotherAttribNamedDIV value=""
key=class value="someclass"

I am looking for regular expressions to do this.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
MPV
  • 11
  • 1
  • 1
  • 2
  • 1
    You can't parse [X]HTML with regex. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Homam Apr 11 '11 at 14:50
  • Don't use capitals for your html tags. – David Apr 11 '11 at 17:26

2 Answers2

11

You can do this with the HtmlAgilityPack

string myDiv = @"<DIV myAttribute style=""BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"" id=my_ID anotherAttribNamedDIV class=""someclass""></DIV>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(myDiv);
HtmlNode node = doc.DocumentNode.SelectSingleNode("div");

Literal1.Text = ""; 

foreach (HtmlAttribute attr in node.Attributes)
{
    Literal1.Text += attr.Name + ": " + attr.Value + "<br />";
}
Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
MikeM
  • 27,227
  • 4
  • 64
  • 80
-1
HtmlDocument docHtml = new HtmlWeb().Load(url);
lennon310
  • 12,503
  • 11
  • 43
  • 61
puru
  • 59
  • 1
  • 2