9

I've been working on a WebCrawler written in C# using System.Windows.Forms.WebBrowser. I am trying to download a file off a website and save it on a local machine. More importantly, I would like this to be fully automated. The file download can be started by clicking a button that calls a javascript function that sparks the download displaying a “Do you want to open or save this file?” dialog. I definitely do not want to be manually clicking “Save as”, and typing in the file name.

I am aware of HttpWebRequest and WebClient’s download functions, but since the download is started with a javascript, I do now know the URL of the file. Fyi, the javascript is a doPostBack function that changes some values and submits a form.

I’ve tried getting focus on the save as dialog from WebBrowser to automate it from in there without much success. I know there’s a way to force the download to save instead of asking to save or open by adding a header to the http request, but I don’t know how to specify the filepath to download to.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
Sharath
  • 101
  • 1
  • 1
  • 3
  • Do you have a solution to your last problem, how to download the file when it is generated on the fly and you can't determine that it is a file download from the url? –  May 06 '11 at 07:07

4 Answers4

6

I think you should prevent the download dialog from even showing. Here might be a way to do that:

  • The Javascript code causes your WebBrowser control to navigate to a specific Url (what would cause the download dialog to appear)

  • To prevent the WebBrowser control from actually Navigating to this Url, attach a event handler to the Navigating event.

  • In your Navigating event you'd have to analyze if this is the actual Navigation action you'd want to stop (is this one the download url, perhaps check for a file extension, there must be a recognizable format). Use the WebBrowserNavigatingEventArgs.Url to do so.

  • If this is the right Url, stop the Navigation by setting the WebBrowserNavigatingEventArgs.Cancel property.

  • Continue the download yourself with the HttpWebRequest or WebClient classes

Have a look at this page for more info on the event:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigating.aspx

Yvo
  • 18,681
  • 11
  • 71
  • 90
  • 1
    I've already tried getting the url using an HttpDebugger to look at the http request and responses. The url is exactly the same, one being a GET request, the other being a POST request. I also just tried your suggestion without luck. – Sharath Jul 17 '09 at 20:40
  • You might want to use the WebBrowser control to get to the very end, just before the form would be submitted and then extract the POST destination of the form using DOM (get a reference to the HTML document body and from there make your way to the form). – Yvo Jul 18 '09 at 10:52
6

A similar solution is available at http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/d338a2c8-96df-4cb0-b8be-c5fbdd7c9202/?prof=required

This work perfectly if there is direct URL including downloading file-name.

But sometime some URL generate file dynamically. So URL don't have file name but after requesting that URL some website create file dynamically and then open/save dialog comes.

for example some link generate pdf file on the fly.

How to handle such type of URL?

Darren
  • 68,902
  • 24
  • 138
  • 144
Vikram Gehlot
  • 123
  • 3
  • 6
4

Take a look at Erika Chinchio article on http://www.codeproject.com/Tips/659004/Download-of-file-with-open-save-dialog-box

I have successfully used it for downloading dynamically generated pdf urls.

  • 2
    Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Jérémie Bertrand Sep 18 '15 at 14:26
3

Assuming the System.Windows.Forms.WebBrowswer was used to access a protected page with a protected link that you want to download:

This code retrieves the actual link you want to download using the web browser. This code will need to be changed for your specific action. The important part is this a field documentLinkUrl that will be used below.

var documentLinkUrl = default(Uri);
browser.DocumentCompleted += (object sender, WebBrowserDocumentCompletedEventArgs e) =>
{
    var aspForm = browser.Document.Forms[0];
    var downloadLink = browser.Document.ActiveElement
        .GetElementsByTagName("a").OfType<HtmlElement>()
        .Where(atag => 
            atag.GetAttribute("href").Contains("DownloadAttachment.aspx"))
        .First();

    var documentLinkString = downloadLink.GetAttribute("href");
   documentLinkUrl = new Uri(documentLinkString);
}
browser.Navigate(yourProtectedPage);

Now that the protected page has been navigated to by the web browser and the download link has been acquired, This code downloads the link.

private static async Task DownloadLinkAsync(Uri documentLinkUrl)
{
    var cookieString = GetGlobalCookies(documentLinkUrl.AbsoluteUri);
    var cookieContainer = new CookieContainer();
    using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
    using (var client = new HttpClient(handler) { BaseAddress = documentLinkUrl })
    {
        cookieContainer.SetCookies(this.documentLinkUrl, cookieString);
        var response = await client.GetAsync(documentLinkUrl);
        if (response.IsSuccessStatusCode)
        {
            var responseAsString = await response.Content.ReadAsStreamAsync();
            // Response can be saved from Stream

        }
    }
}

The code above relies on the GetGlobalCookies method from Erika Chinchio which can be found in the excellent article provided by @Pedro Leonardo (available here),

[System.Runtime.InteropServices.DllImport("wininet.dll", CharSet = System.Runtime.InteropServices.CharSet.Auto, SetLastError = true)]
static extern bool InternetGetCookieEx(string pchURL, string pchCookieName,
    System.Text.StringBuilder pchCookieData, ref uint pcchCookieData, int dwFlags, IntPtr lpReserved);

const int INTERNET_COOKIE_HTTPONLY = 0x00002000;

private string GetGlobalCookies(string uri)
{
    uint uiDataSize = 2048;
    var sbCookieData = new System.Text.StringBuilder((int)uiDataSize);
    if (InternetGetCookieEx(uri, null, sbCookieData, ref uiDataSize,
            INTERNET_COOKIE_HTTPONLY, IntPtr.Zero)
        &&
        sbCookieData.Length > 0)
    {
        return sbCookieData.ToString().Replace(";", ",");
    }
    return null;
}
Joshcodes
  • 8,513
  • 5
  • 40
  • 47