6

I'd like to open an HTML document (as a string retrieved from a StreamReader, from the web), by creating a XMLDocument this way:

XmlDocument doc = new XmlDocument

doc.Load(string containing the retrieved document).

But since the HTML doc contains this head:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" > 

It tells me that the document is invalid... Any way to workaround this?

Brian Rasmussen
  • 114,645
  • 34
  • 221
  • 317
Vincent S
  • 65
  • 1
  • 1
  • 3

4 Answers4

3

Normal html, even if it's valid html, is not valid xml.

There is a library called HtmlAgilityPack which is a popular 3rd party open source library that you can use to solve this problem:

MattAllegro
  • 6,455
  • 5
  • 45
  • 52
rtpHarry
  • 13,019
  • 4
  • 43
  • 64
0

One can use HTML Tidy Tidy.NET for this.

Zain Ali
  • 15,535
  • 14
  • 95
  • 108
0

If you're positive that the HTML is valid XML, I imagine you could simply replace the HTML head with an XML one.

Jemes
  • 2,822
  • 21
  • 22
0

first you have to validate that the XHTML is a valid XHTML document (it means that is a valid XML document too).

paste your XHTML code here and review the output. http://validator.w3.org/#validate_by_input

good luck!.

fdaines
  • 1,216
  • 10
  • 12