4

i'm trying to parse an html page with XPathDocument, but gives error 'cause the html is not an xml... is there a way to do this or not?

ghiboz
  • 7,863
  • 21
  • 85
  • 131

2 Answers2

7

Should use HtmlAgilityPack. Still the best!

carla
  • 1,970
  • 1
  • 31
  • 44
pinichi
  • 2,199
  • 15
  • 17
3

Use something like Html Agility Pack which can load your html into a DOM object which can be traversed with for example xpath queries.

Unless your html is in fact xhtml, it is usually not a valid xml structure with correct opening and ending node tags.

Mikael Svenson
  • 39,181
  • 7
  • 73
  • 79
  • I would like to mark this answer up, but htmlagilitypack does not work with the doc I'm giving it, the LoadFile() method does not have a return value, and does not throw an exception either. The document appears to not return anything when I query it either, so I'm assuming the code has "silently failed" when this happens? –  Jan 02 '13 at 15:46
  • Hi @ConradB, Have you tried the sample at http://htmlagilitypack.codeplex.com/wikipage?title=Examples? Load should not return anything, but it should make you able to loop over nodes doing selections. – Mikael Svenson Jan 02 '13 at 20:29