0

I have an Html String which I need to parse, but WITHOUT using HTMLAgilityPack.

Using XPath is quite simple, but how do I get it working using XPath?

Joe Almore
  • 4,036
  • 9
  • 52
  • 77
  • or using WebBrowser control... –  Jul 19 '16 at 03:58
  • 1
    No, you cannot use `XPath` directly because Html is NOT Xml. Html allows tag without closing but Xml do not. – Joshua Jul 19 '16 at 04:09
  • Also, what do you want to get from parsing html? If you just want to extract some variable, using regular expression and treating the html like string will work. – Joshua Jul 19 '16 at 04:11
  • For WinForms there is `System.Windows.Forms.WebBrowser` which acts like a simple web browser (http://stackoverflow.com/a/56629). On ASP .NET, you need Regex class and creating HTML parser manually, since XPath usage usually combined with `HtmlAgilityPack`. – Tetsuya Yamamoto Jul 19 '16 at 04:15
  • 3
    Is there a particular reason you don't want to use HtmlAgilityPack? What you're asking is akin to saying, "I want to drive a nail, *without* using a hammer." Use the best available tool for the job. – Jim Mischel Jul 19 '16 at 04:16
  • @JimMischel The parsing routine is called from unmanaged code, hence the `HTMLAgilityPack` crashes when it tries to create the class `HtmlWeb` or when it tries to create a `HtmlDocument` from a simple `String`. That is why I can't use `HTMLAgilityPack`. – Joe Almore Jul 19 '16 at 15:47
  • Perhaps the problem isn't with HtmlAgilityPack, but rather with the way your code is handling the call from unmanaged code. I don't have any experience calling HtmlAgilityPack from a method that's called from unmanaged code, but I've done plenty of other things from such a method. It should "just work." I'd suspect an error in the way you're doing the callback, but without an example and some specifics about the error you encounter, I can't say what it is. In short, I think you're trying to solve the wrong problem. – Jim Mischel Jul 19 '16 at 15:52

1 Answers1

1

I'm not sure I understood your objection to HTML Agility Pack. I've just written a lightweight HTML parse and posted it on Github as HTML Monkey.

The initial version does not support XPath, but I'm still working on the project and looking for feedback and suggestions.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466