1

Duplicate: Looking for C# HTML parser. Please close.

Can you recommend me a library for reading HTML files as XML in .NET? I'd actually prefer to deal with XML objects rather than text. Ideally, it must fix HTML formatting errors.

Community
  • 1
  • 1
Alex Yakunin
  • 6,330
  • 3
  • 33
  • 52

1 Answers1

2

You may want to rethink this. The two are not equal.

a great example of this is self closing tags.

XML standard indicates that a self closing tag looks like the following:

<br/>

while html standards has non-content tags as single tags

<br>
<link rel="...">

In html, using the xml syntax actually is a violation, as /> has a different meaning.

There are more examples of these issues in the following article.

Tim Hoolihan
  • 12,316
  • 3
  • 41
  • 54
  • 2
    That's precisely the point of the question - he wants a library that would read HTML, with all its quirks, and expose it as well-formed XHTML. So `
    ` gets translated to `
    `, implicitly-closed `

    ` becomes explicitly closed, etc.

    – Pavel Minaev Jul 16 '09 at 17:22