0

I have html file that is a well-formed xml document (tags are paired), but contains anchor like the one below:

<a href="mailto:test@domain.com?subject=Hello&body=someMessageHere" target="_top" style="text-decoration: none;">link</a>

Xml parser invoked by XDocument.Load throws XmlException that says:

Additional information: '=' is an unexpected token. The expected token is ';'.

How can I instruct parser that I '&body' is not an entity? Do I must escape '&' character?

Tim S. Van Haren
  • 8,861
  • 2
  • 30
  • 34
user3284063
  • 665
  • 5
  • 20

1 Answers1

1

Not all HTML is going to be valid XML so you shouldn't try to parse it as such (although, in this case, it looks like you have some un-escpaped strings in the document that should probably get taken care of).

Instead, you should use something like the HTMLAgilityPack to parse your HTML and work with the document that way.

Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
  • Is is possible that valid Html is not a valid Xml? I thought that html is just a narrowed Xml (?) – user3284063 Jun 12 '14 at 15:29
  • HTML is more flexible and forgiving (closing tags, character encoding, etc). Browsers are fairly forgiving when rendering HTML. XHTML is the only markup that *should* be guaranteed to be valid XML. – Justin Niessner Jun 12 '14 at 15:32