Parse XDocument with support for sloppy HTML like browsers

Asked Jun 22 '21 at 17:24

Active Jun 24 '21 at 19:25

Viewed 189 times

Web browsers support HTML documents that are not proper XML. For example, browsers allow unclosed <p>, <link>, <meta> or other tags.

In C#, how can I parse an HTML string to an XDocument and have any invalid XML corrected, instead of an exception being thrown?

asked Jun 22 '21 at 17:24

JamesFaix

2

You use an HTML parser like anglesharp or htmlagility pack, not XDocument. – Crowcoder Jun 22 '21 at 17:44
1

Thats because HTML is not XML and it's not a subset of HTML. XHTML, on the other hand, was an atempt to make a HTML complaint to XML, but it's not very common to find. – Magnetron Jun 22 '21 at 17:52
From the answer to [How to read HTML as XML?](https://stackoverflow.com/a/5472221/3744182) by Konrad Rudolph: *HTML simply isn’t the same as XML (unless the HTML actually happens to be conforming XHTML or HTML5 in XML mode). The best way is to use a HTML parser to read the HTML. Afterwards you may transform it to Linq to XML – or process it directly.* You might look at [How to convert HTML to XHTML?](https://stackoverflow.com/q/138555/3744182) to see if any of the automatic conversion tools there work for you. – dbc Jun 24 '21 at 19:24

0 Answers0