1

I'm looking for a free tool or dlls that I can use to write my own code in .NET to process some web requests. Let's say I have a URL with some query string parameters similar to http://www.example.com?param=1 and when I use it in a browser several redirects occur and eventually HTML is rendered that has a frameset and a frame's inner html contains a table with data that I need. I want to store this data in the external file in a CSV format. Obviously the data is different depending on the querystring parameter param. Let's say I want to run the application and generate 1000 CSV files for param values from 1 to 1000.

I have good knowledge in .NET, javascript, HTML, but the main problem is how to get the final HTML in the server code.

What I tried is I created a new Form Application, added a webbrowser control and used code like this:

private void FormMain_Shown(object sender, EventArgs e)
    {
        var param = 1; //test
        var url = string.Format(Constants.URL_PATTERN, param);

        WebBrowserMain.Navigated += WebBrowserMain_Navigated;
        WebBrowserMain.Navigate(url);
    }

    void WebBrowserMain_Navigated(object sender, WebBrowserNavigatedEventArgs e)
    {
        if (e.Url.OriginalString == Constants.FINAL_URL)
        {
            var document = WebBrowserMain.Document.Window.Frames[0].Document;
        }
    }

But unfortunately I receieve unauthorizedaccessexception because probably frame and the document are in different domains. Does anybody has an idea of how to work around this and maybe another brand new approach to implement functionality like this?

Ihor Deyneka
  • 1,326
  • 1
  • 19
  • 37
  • It isn't clear, are you trying to execute this Winforms app on the server side? – noseratio Oct 29 '13 at 09:40
  • I have no access to the server, this is the winforms app that I've built and calling from the clientside that is trying to navigate to the external server by the URL to get the HTML I have no control to. Then I need to process the HTML to get the necessary table data. – Ihor Deyneka Oct 29 '13 at 09:59
  • Try handling `DocumentCompleted` instead of `Navigated`. Better yet, you're after the DOM `window.onload` event of the top page. Check this for more details: http://stackoverflow.com/a/19283143/1768303 – noseratio Oct 29 '13 at 10:25
  • DocumentCompleted makes no difference - unauthorizedaccessexception. I am not able to access the html that is inside the FRAME. – Ihor Deyneka Oct 29 '13 at 10:34
  • 1
    If you're sure the URL of the inner frame is from a different domain, here's how you can get to the frame, it's tricky: http://stackoverflow.com/q/3508317/1768303 – noseratio Oct 29 '13 at 10:42

1 Answers1

2

Thanks to the Noseratio's comments I managed to do that with the WebBrowser control. Here are some major points that might help others who have similar questions:

1) DocumentCompleted event should be used. For Navigated event body of the document is NULL.

2) Following answer helped a lot: WebBrowserControl: UnauthorizedAccessException when accessing property of a Frame

3) I was not aware about IHTMLWindow2 similar interfaces, for them to work correctly I added references to following COM libs: Microsoft Internet Controls (SHDocVw), Microsoft HTML Object Library (MSHTML).

4) I grabbed the html of the frame with the following code:

    void WebBrowserMain_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        if (e.Url.OriginalString == Constants.FINAL_URL)
        {
            try
            {
                var doc = (IHTMLDocument2) WebBrowserMain.Document.DomDocument;
                var frame = (IHTMLWindow2) doc.frames.item(0);
                var document = CrossFrameIE.GetDocumentFromWindow(frame);
                var html = document.body.outerHTML;

                var dataParser = new DataParser(html);
                //my logic here
            }

5) For the work with Html, I used the fine HTML Agility Pack that has some pretty good XPath search.

wp78de
  • 18,207
  • 7
  • 43
  • 71
Ihor Deyneka
  • 1,326
  • 1
  • 19
  • 37