Duplicate: Looking for C# HTML parser. Please close.
Can you recommend me a library for reading HTML files as XML in .NET? I'd actually prefer to deal with XML objects rather than text. Ideally, it must fix HTML formatting errors.
Duplicate: Looking for C# HTML parser. Please close.
Can you recommend me a library for reading HTML files as XML in .NET? I'd actually prefer to deal with XML objects rather than text. Ideally, it must fix HTML formatting errors.
You may want to rethink this. The two are not equal.
a great example of this is self closing tags.
XML standard indicates that a self closing tag looks like the following:
<br/>
while html standards has non-content tags as single tags
<br>
<link rel="...">
In html, using the xml syntax actually is a violation, as />
has a different meaning.
There are more examples of these issues in the following article.
` becomes explicitly closed, etc.
– Pavel Minaev Jul 16 '09 at 17:22