1
<html>
<font color=#FF0000>Gaurang</font>
<font color=#00FF00>Bhavesh</font>
<font color=#FF0000>Bhupesh</font>
<font color=#FF0000>AAditya</font>
</html>

I want to parse the above string as xml in C#. When I try it give error such as '#' is an unexpected token. The expected token is '"' or '''.

Wasif Hossain
  • 3,900
  • 1
  • 18
  • 20
Gaurang
  • 371
  • 2
  • 4
  • 21

2 Answers2

3

It seems to be an html rather than xml, So using HtmlAgilityPack

var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(filename);
var colors = doc.DocumentNode.Descendants("font")
             .ToDictionary(e => e.InnerText, e => e.Attributes["color"].Value);


foreach(var color in colors)
{
    Console.WriteLine("{0}:{1}", color.Key, color.Value);
}
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Hi L.B. Thanks for your answer. It works fine but I would like to know is it possible to do so without using HtmlAgilityPack? I would like to avoid using a third party dll. – Gaurang Feb 17 '14 at 08:58
  • 1
    @user3180333 only partially working solutions: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – L.B Feb 17 '14 at 09:04
0

The sample data you posted is not valid XML. There are differences between HTML and XML, and one of them is that most web browsers don't require quotation marks around values, but most XML parsers do. So the following is valid XML:

<font color="#FF0000">Gaurang</font>

But this is not...

<font color=#FF0000>Gaurang</font>
Adam Valpied
  • 178
  • 6