1

I am trying to parse my way out of HTML emails to store their content as intelligible raw text.

HtmlAgilityPack seems well received but leaves me with most of the parsing/interpreting to do, and we're talking rather messy looking HTML.

On the other hand if I load a sample HTML email in IE/Firefox/Chrome they all get the parsing right, and a simple copy/paste gets me the text I want.

There seems to be ways to tap into Trident from C# using a Windows.Forms.WebBrowser but my project being command line based this would be a rather hackish way of doing things.

So my question, in a nutshell: is there a non graphical way to use Trident/Gecko/Chrome to parse HTML into text?

  • I suggested MSHTML in this similar question; http://stackoverflow.com/a/5871508/246342 – Alex K. Mar 30 '12 at 16:15
  • You can still create a WebBrowser without a form. – L.B Mar 30 '12 at 16:15
  • @AlexK. I just gave it a go and, while not perfect, this is at least readable! Thanks for the tip, might use that! – user1303659 Mar 30 '12 at 16:34
  • @L.B Thanks, haven't dabbled with Forms before. I tried to just add a Forms.WebBrowser to my tool and make it navigate to my test file but the thing crashed on me with some "ActiveX cannot be instantiated" errors. – user1303659 Mar 30 '12 at 16:42

0 Answers0