1

I have a strange behavior using InternetExplorer Object to read information from http sites:

My Code so far:

string url = "http://www.someURL.com"
InternetExplorer IE = new InternetExplorer();
IE.Visible = true;
IE.Navigate(url);
while (IE.Busy == true) {
     System.Threading.Thread.Sleep(500);
}

IHTMLDocument2 htmlDoc = IE.Document as IHTMLDocument2;
string sourceCode = htmlDoc.body.outerHTML;
IE.Quit();

This code works for some Webistes, for some it doesnt. For those which doesnt, it fails with

System.Runtime.InteropServices.COMException

HRESULT E_FAIL

I found a workaround for those websites, but i am not happy with that:

static private SHDocVw.ShellWindows shellWindows = new SHDocVw.ShellWindows();
foreach (SHDocVw.InternetExplorer ii in shellWindows) {
        //Check URL or Title to figure out if this is my window
}

When i have more windows open with the same Title or URL i do not know which one is the one i opened before.

Unfortunatelly i can not share the not working URL because it is not reachable from outside my infrastructure.

Community
  • 1
  • 1
Julian
  • 185
  • 1
  • 15
  • 1
    If you just want to fetch the raw (unparsed) source something like WebClient is far superior. If you do want parsing there are [better headless browsers than IE](https://stackoverflow.com/questions/10161413/headless-browser-for-c-sharp-net) – Alex K. Sep 13 '17 at 13:52
  • 1
    `SHDocVw` is a shell document and uses the version of IE in the system. Not sure about `InternetExplorer` but it could be similar to `WebBrowser` component of Visual Studio which in some cases do not support HTML5 etc and behave mostly like IE7. You may want to try and emulate to IE11. Possibly this link could give some hint! https://stackoverflow.com/questions/17922308/use-latest-version-of-internet-explorer-in-the-webbrowser-control – praty Sep 13 '17 at 13:59
  • Sorry for the late response. I was researching without success. Unfortunatelly my usecase is very special. My website only runs on IE and also requires a strange and complicated login procedure. Emulating IE11 or using other headless browsers fail at the login. Therefor is see no other option than looping through the shellWindows as discribed as "workaround" before. Anyway thanks for your suggestions. I learned a little bit the past days :-) – Julian Sep 15 '17 at 08:30
  • I would be glad to help with this. Is it possible for you to post some of the code for the websites that are failing? – Alexander Ryan Baggett Dec 08 '17 at 22:39
  • Also I can't help but wondering if you are running into website that loads content after DOMContentLoaded or after document.readystate = complete. – Alexander Ryan Baggett Dec 20 '17 at 16:19

0 Answers0