Suppress annoying popups from MSHTML

Question

I am trying to use MSHTML to get plaintext from the HTML of a website. It appears to be working and providing fairly clean plain-text (any suggestions for a better HTML to plaintext solution are welcome too)

Everything is working fine except that frequently I will get a popup "Windows Security Warning" asking if I want to allow the website to put cookies on my computer (I have seen this warning before when using IE). Also, it periodically opens Google Chrome to the google sign-in page which is very odd. Is there some way to disable all script and external resource loading? I only want to get the plain text and don't need it to actually execute the page.

Here's my code:

HTMLDocument htmldoc = new HTMLDocument();
IHTMLDocument2 htmldoc2 = (IHTMLDocument2)htmldoc;
htmldoc2.write(new object[] { currentCode });
currentCode = sanitizeText(htmldoc2.body.outerText.Replace('\n', ' ').Replace('\r', ' ').Replace('\"', '"'), false, false);

If you use such paranoid security settings and willing to disable scripts altogether why not use HtmlAgilityPack as many other web scraper solutions in C#? — Alexei Levenkov, Mar 28 '14 at 20:43
"Paranoid security settings"? They are set to the default :) I haven't changed my security settings. And yes, I have tried to use HtmlAgilityPack but I found that it doesn't produce very clean plain-text (at least with the code I was using). I would be happy to give HtmlAgilityPack another try though. Could you point me in the right direction to find code that would convert HTML to plain-text using HtmlAgilityPack? — abagshaw, Mar 28 '14 at 20:51
See http://stackoverflow.com/questions/731649/how-can-i-convert-html-to-text-in-c. The HTML Agility Pack has an HTML-to-Text sample (or it did). Also, the OP employed a different method that involved using the lynx.exe text-mode browser. — Jim Mischel, Mar 28 '14 at 21:50

Suppress annoying popups from MSHTML

0 Answers0