c# parse html using XPathDocument

Question

i'm trying to parse an html page with XPathDocument, but gives error 'cause the html is not an xml... is there a way to do this or not?

check here: http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c — pinichi, Oct 15 '10 at 07:26

score 7 · Accepted Answer · edited Nov 26 '17 at 04:52

7

Should use HtmlAgilityPack. Still the best!

edited Nov 26 '17 at 04:52

carla

answered Oct 15 '10 at 07:25

pinichi

score 3 · Answer 2 · answered Oct 15 '10 at 07:25

3

Use something like Html Agility Pack which can load your html into a DOM object which can be traversed with for example xpath queries.

Unless your html is in fact xhtml, it is usually not a valid xml structure with correct opening and ending node tags.

answered Oct 15 '10 at 07:25

Mikael Svenson

I would like to mark this answer up, but htmlagilitypack does not work with the doc I'm giving it, the LoadFile() method does not have a return value, and does not throw an exception either. The document appears to not return anything when I query it either, so I'm assuming the code has "silently failed" when this happens? – Jan 02 '13 at 15:46
Hi @ConradB, Have you tried the sample at http://htmlagilitypack.codeplex.com/wikipage?title=Examples? Load should not return anything, but it should make you able to loop over nodes doing selections. – Mikael Svenson Jan 02 '13 at 20:29

2 Answers2