6

I coded a very small website bot in C# using the default WebBrowser control. Actually almost everything is working the way it is supposed to work, yet I seem to have problems with the very last step of my automation.

The website was built using several iframes. This isn't much of a big deal as I simply access those frames and their elements using

webBrowser1.Document.Window.Frames[0].Document.GetElementById("element").InvokeMember("click");

This however does not work when the source of the IFRAME is being hosted on a different domain than the actual website. As I searched the internet for an answer to my problem I stumbled across an MSDN article mentioning this specific problem and they were referring to safety measures against cross site scripting which might be the reason for this error.

I couldn't really find a way of disabling this feature so I moved on and decided to recode everything to work with geckofx-12 instead of the default (IE based) web browser control, yet I ran into similar issues...

My question is: Is there any way I can bypass this annoying behaviour? I don't really care about security concerns or on whether geckofx or the default web browser control is being used, I would just like to programmatically access the elements of a site which is being hosted on a different domain without running into an UnauthorizedAccessException.

I would love to get advice from the gurus out there.

beta
  • 2,583
  • 15
  • 34
  • 46

3 Answers3

8

You can't access frames from different domains. That is a security feature. There is a little hack for it:

 public class CrossFrameIE
{
    // Returns null in case of failure.
    public static IHTMLDocument2 GetDocumentFromWindow(IHTMLWindow2 htmlWindow)
    {
        if (htmlWindow == null)
        {
            return null;
        }

        // First try the usual way to get the document.
        try
        {
            IHTMLDocument2 doc = htmlWindow.document;                

            return doc;
        }
        catch (COMException comEx)
        {
            // I think COMException won't be ever fired but just to be sure ...
            if (comEx.ErrorCode != E_ACCESSDENIED)
            {
                return null;
            }
        }
        catch (System.UnauthorizedAccessException)
        {
        }
        catch
        {
            // Any other error.
            return null;
        }

        // At this point the error was E_ACCESSDENIED because the frame contains a document from another domain.
        // IE tries to prevent a cross frame scripting security issue.
        try
        {
            // Convert IHTMLWindow2 to IWebBrowser2 using IServiceProvider.
            IServiceProvider sp = (IServiceProvider)htmlWindow;

            // Use IServiceProvider.QueryService to get IWebBrowser2 object.
            Object brws = null;
            sp.QueryService(ref IID_IWebBrowserApp, ref IID_IWebBrowser2, out brws);

            // Get the document from IWebBrowser2.
            IWebBrowser2 browser = (IWebBrowser2)(brws);

            return (IHTMLDocument2)browser.Document;
        }
        catch
        {
        }

        return null;
    }

    private const int E_ACCESSDENIED = unchecked((int)0x80070005L);
    private static Guid IID_IWebBrowserApp = new Guid("0002DF05-0000-0000-C000-000000000046");
    private static Guid IID_IWebBrowser2 = new Guid("D30C1661-CDAF-11D0-8A3E-00C04FC9E26E");
}

// This is the COM IServiceProvider interface, not System.IServiceProvider .Net interface!
[ComImport(), ComVisible(true), Guid("6D5140C1-7436-11CE-8034-00AA006009FA"),
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)]
public interface IServiceProvider
{
    [return: MarshalAs(UnmanagedType.I4)]
    [PreserveSig]
    int QueryService(ref Guid guidService, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppvObject);
}
Daniel Bogdan
  • 758
  • 10
  • 23
6

I updated the hack that Daniel Bogdan posted slightly to use extension methods and give you a way of calling it without having to go into the mshtml namespace:

using mshtml;
using SHDocVw;
using System;
using System.Reflection;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace TradeAutomation
{
    public static class CrossFrameIE
    {
        private static FieldInfo ShimManager = typeof(HtmlWindow).GetField("shimManager", BindingFlags.NonPublic | BindingFlags.Instance);
        private static ConstructorInfo HtmlDocumentCtor = typeof(HtmlDocument).GetConstructors(BindingFlags.NonPublic | BindingFlags.Instance)[0];

        public static HtmlDocument GetDocument(this HtmlWindow window)
        {
            var rawDocument = (window.DomWindow as IHTMLWindow2).GetDocumentFromWindow();

            var shimManager = ShimManager.GetValue(window);

            var htmlDocument = HtmlDocumentCtor
                .Invoke(new[] { shimManager, rawDocument }) as HtmlDocument;

            return htmlDocument;
        }


        // Returns null in case of failure.
        public static IHTMLDocument2 GetDocumentFromWindow(this IHTMLWindow2 htmlWindow)
        {
            if (htmlWindow == null)
            {
                return null;
            }

            // First try the usual way to get the document.
            try
            {
                IHTMLDocument2 doc = htmlWindow.document;

                return doc;
            }
            catch (COMException comEx)
            {
                // I think COMException won't be ever fired but just to be sure ...
                if (comEx.ErrorCode != E_ACCESSDENIED)
                {
                    return null;
                }
            }
            catch (System.UnauthorizedAccessException)
            {
            }
            catch
            {
                // Any other error.
                return null;
            }

            // At this point the error was E_ACCESSDENIED because the frame contains a document from another domain.
            // IE tries to prevent a cross frame scripting security issue.
            try
            {
                // Convert IHTMLWindow2 to IWebBrowser2 using IServiceProvider.
                IServiceProvider sp = (IServiceProvider)htmlWindow;

                // Use IServiceProvider.QueryService to get IWebBrowser2 object.
                Object brws = null;
                sp.QueryService(ref IID_IWebBrowserApp, ref IID_IWebBrowser2, out brws);

                // Get the document from IWebBrowser2.
                IWebBrowser2 browser = (IWebBrowser2)(brws);

                return (IHTMLDocument2)browser.Document;
            }
            catch
            {
            }

            return null;
        }

        private const int E_ACCESSDENIED = unchecked((int)0x80070005L);
        private static Guid IID_IWebBrowserApp = new Guid("0002DF05-0000-0000-C000-000000000046");
        private static Guid IID_IWebBrowser2 = new Guid("D30C1661-CDAF-11D0-8A3E-00C04FC9E26E");
    }

    // This is the COM IServiceProvider interface, not System.IServiceProvider .Net interface!
    [ComImport(), ComVisible(true), Guid("6D5140C1-7436-11CE-8034-00AA006009FA"),
    InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)]
    public interface IServiceProvider
    {
        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int QueryService(ref Guid guidService, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppvObject);
    }
}

Usage:

webBrowser1.Document.Window.Frames["main"].GetDocument();

As mentioned in my comment above, you'll also need to add a reference to SHDocVw. You can find directions for that here: Add reference 'SHDocVw' in C# project using Visual C# 2010 Express

Drew Delano
  • 1,421
  • 16
  • 21
2

I havent tried this but changing the document domain apparently works.

Using geckofx 12 it looks like this might be done by nsIDOMHTMLDocument.SetDomainAttribute (The GeckoDocument.Domain doesn't have a setter but you could easily add it)

IE. if you change the domain of the document to match the sub frame you might be able to access it.

Neeraj Dubey
  • 4,401
  • 8
  • 30
  • 49
Tom
  • 6,325
  • 4
  • 31
  • 55
  • 2
    Unfortunately I got a COM exception when trying to use this method. I however got things working by getting the iframes with webBrowser1.Document.GetElementsByTagName("iframe") and accessing their content document via ((Gecko.DOM.GeckoIFrameElement)frames[0]).ContentDocument which worked flawlessly. I marked your answer as the solution though, since there are not other answers anyway. – beta May 21 '12 at 17:41