i'm trying to parse an html page with XPathDocument, but gives error 'cause the html is not an xml... is there a way to do this or not?
Asked
Active
Viewed 1.1k times
4
-
check here: http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c – pinichi Oct 15 '10 at 07:26
2 Answers
3
Use something like Html Agility Pack which can load your html into a DOM object which can be traversed with for example xpath queries.
Unless your html is in fact xhtml, it is usually not a valid xml structure with correct opening and ending node tags.

Mikael Svenson
- 39,181
- 7
- 73
- 79
-
I would like to mark this answer up, but htmlagilitypack does not work with the doc I'm giving it, the LoadFile() method does not have a return value, and does not throw an exception either. The document appears to not return anything when I query it either, so I'm assuming the code has "silently failed" when this happens? – Jan 02 '13 at 15:46
-
Hi @ConradB, Have you tried the sample at http://htmlagilitypack.codeplex.com/wikipage?title=Examples? Load should not return anything, but it should make you able to loop over nodes doing selections. – Mikael Svenson Jan 02 '13 at 20:29