Downloading multiple files WebClient

Question

I'm trying to download multiple files but it's not working as I hoped. Can someone tell me what's wrong with this script, because I've tried a lot of things and really don't know what to do anymore.

public static void DownloadFile(string url)
        {
            WebClient client = new WebClient();
            var name = url.Substring(url.LastIndexOf('/')).Remove(0, 1);
            foreach (var item in urls)
            {
                client.DownloadFile(item, "C:\\" + name);
            }
        }

        private void btnGo_Click(object sender, EventArgs e)
        {
            urls.Add("url1");
            urls.Add("url2");
            urls.Add("url3");
            Parallel.ForEach(urls,
               new ParallelOptions { MaxDegreeOfParallelism = 10 }, 
               DownloadFile);
        }

using (var sr = new StreamReader(HttpWebRequest.Create(url).GetResponse().GetResponseStream()))
            {
                using (var sw = new StreamWriter(url.Substring(url.LastIndexOf('/'))))
                {
                    sw.Write(sr.ReadToEnd());
                }
            }

The answer is here http://stackoverflow.com/questions/6992553/how-do-i-async-download-multiple-files-using-webclient-but-one-at-a-time — G-Man, Jul 11 '12 at 01:36
@GX. Well, it downloads all the files at the same time and overwrites each other, so you get 3 corrupted files. — Yuki Kutsuya, Jul 11 '12 at 01:42

Aidiakapi · Accepted Answer · 2012-07-11T02:13:22.940

5

I would use a System.Net.HttpWebRequest instead.

This is what the code would look like:

private List<string> urls = new List<string>();

private void btnGo_Click(object sender, EventArgs e)
{
    urls.Add("http://199.91.152.106/ua0p3fbc5nlg/gg2w2fq4ljc1nnd/MicroCraft_Beta.zip");
    Parallel.ForEach(urls, new ParallelOptions { MaxDegreeOfParallelism = 10 }, DownloadFile);
}

public static void DownloadFile(string url)
{
    var req = (HttpWebRequest)WebRequest.Create(url);
    var name = url.Substring(url.LastIndexOf('/') + 1);
    using (var res = (HttpWebResponse)req.GetResponse())
    using (var resStream = res.GetResponseStream())
    using (var fs = new FileStream("C:\\" + name, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        // Save to file
        var buffer = new byte[8 * 1024]; // 8 KB buffer
        int len; // Read count
        while ((len = resStream.Read(buffer, 0, buffer.Length)) > 0)
            fs.Write(buffer, 0, buffer.Length);
    }
}

Because the URL you told me in the comment isn't using a proper implementation of the HTTP protocol. You'll have to add this to your config file in order for it to work (either App.config or Web.config, depending on if it's an ASP.Net site or offline application):

<system.net>
    <settings>
        <httpWebRequest useUnsafeHeaderParsing="true" />
    </settings>
</system.net>

As to your problem with names colliding which you said in your comment, this should be resolved by changing your var name = url.Substring(url.LastIndexOf('/')).Remove(0, 1); into something else.

If you want to have an incremental filename, you could use this:

// Inside your class:
private static int counter = 0;

// In your method:
var name = "file" + System.Threading.Interlocked.Increment(ref counter) + ".html";

edited Jul 11 '12 at 02:13

answered Jul 11 '12 at 01:35

Aidiakapi

6,034
4
33
62

I've found a script that does this, I'll add it to my post but this didn't work either :(. – Yuki Kutsuya Jul 11 '12 at 01:36
@Aidiakapi hum ... no ... WebClient is what shoudl be used here http://msdn.microsoft.com/en-us/library/system.net.webclient(v=vs.80).aspx Take a look at the DownloadFile function – G-Man Jul 11 '12 at 01:38
@GX. `WebClient` *can* be used for this, but in essence `WebClient` is nothing but a fancy wrapper around IE. `HttpWebRequest` on the other hand is the basic requirement for this. – Aidiakapi Jul 11 '12 at 01:43
When the script downloads the files, they are all corrupted :(. Any idea why? – Yuki Kutsuya Jul 11 '12 at 01:56
@Darkshadw Most likely because the names are colliding, can you give me some example URL's and the way you want them to be mapped (so for example I want: http://www.testurl.com/testfile.html to be mapped to C:\testfile.html) then I'll give you the code. For now I've added the code for an incremental filename. – Aidiakapi Jul 11 '12 at 01:59
Here's a link I want to download for example: http://199.91.152.106/ua0p3fbc5nlg/gg2w2fq4ljc1nnd/MicroCraft_Beta.zip – Yuki Kutsuya Jul 11 '12 at 02:00
And what do you want the resulting filename to be? – Aidiakapi Jul 11 '12 at 02:01
@Aidiakapi The filename after the last / – Yuki Kutsuya Jul 11 '12 at 02:07
@Darkshadw Then this should be exactly what you want :) – Aidiakapi Jul 11 '12 at 02:14
@Darkshadw The reason why this won't work, is because the URL you have sends a redirect to the client. Which means that instead of downloading the file, you're downloading: http://www.mediafire.com/?gg2w2fq4ljc1nnd. This is MediaFire's anti-bot protection, and it'd probably be difficult to bypass. – Aidiakapi Jul 11 '12 at 02:21
@Aidiakapi I see, but why can Chrome, FireFox and IE etc download the file? – Yuki Kutsuya Jul 11 '12 at 02:22
@Darkshadw It's not locked away, it's just obfuscated. It makes use of some nifty techniques in the HTTP protocol to prevent people from writing bots for it. From their Terms: `You agree while using MediaFire Services, that you may not: Use any robot, spider, offline readers, site search and/or retrieval application, or other device to retrieve or index any portion of the Services, with the exception of public search engines;`. Seeing as how this is against their terms, I cannot help you, nor encourage you to bypass this. – Aidiakapi Jul 11 '12 at 02:29
#Aidiakapi Alright, thanks, I didn't even knew it was against the rules :o. Well atleast I've learned a few things :), thanks! – Yuki Kutsuya Jul 11 '12 at 02:30
Aidiakap, I tried using your code. It downloads first file for me but every other file download throws the timeout exception. When I download the files from browsers (Chrome/IE/FireFox) they do all files. but with the C# code I am not able to download the second file. – KMX Apr 12 '16 at 00:11
That could be caused by many things, it's pretty much impossible for me to tell. You're probably better off looking at other questions, or asking your own if none of them have your issues. For example http://stackoverflow.com/questions/6992553/how-do-i-async-download-multiple-files-using-webclient-but-one-at-a-time – Aidiakapi Apr 12 '16 at 16:19

Alexei Levenkov · Answer 2 · 2012-07-11T01:58:00.400

1

You are downloading all files to the same file in your DownloadFile code which assumes single call to this function downloads all files.

Fixes:

Option 1: Don't use Parallel.ForEach and simply call DownloadFile once. Specify unique file names for each download. I.e. by taking part of Url that you are downloading or simply using random/temporary file names.

Something like this (assuming urls is some sort of IEnumerable<string>)

foreach (var item in urls)
{
   var name = item.Substring(item.LastIndexOf('/')).Remove(0, 1);
   client.DownloadFile(item, "C:\\" + name);
}

Option 2: Use Parallel.ForEach but change DownloadFile code to download only single file:

public static void DownloadFile(string url)
{
    WebClient client = new WebClient();
    var name = url.Substring(url.LastIndexOf('/')).Remove(0, 1);
    client.DownloadFile(url, "C:\\" + name);
}

edited Jul 11 '12 at 01:58

answered Jul 11 '12 at 01:47

Alexei Levenkov

98,904
14
127
179

As you can read in his code, he's using something to create a different filename. – Aidiakapi Jul 11 '12 at 01:50
And name obviously is a variable, that's based on the url. He supplied the code for it: `var name = url.Substring(url.LastIndexOf('/')).Remove(0, 1);` – Aidiakapi Jul 11 '12 at 01:53
@Aidiakapi, got it ... was confused by the code - OP actually does downloading the same file multiple times - updating answer... – Alexei Levenkov Jul 11 '12 at 01:53
@Aidiakapi, thanks. See if updated answer reflects your comments. – Alexei Levenkov Jul 11 '12 at 02:01
I suppose so, though I'm still not a fan of using the bloated `WebClient` for the simple purpose of downloading a file :P. – Aidiakapi Jul 11 '12 at 02:02
@AlexeiLevenkov What exactly is an `IEnumerable`? Is `List urls = new List();` okay to use? – Yuki Kutsuya Jul 11 '12 at 02:04
@Aidiakapi, I'd say one liner to download file is not bloated - but it is purely personal opinion :). +1 to your answer as it does correct thing with close to metal approach. – Alexei Levenkov Jul 11 '12 at 02:07
@Darkshadw `IEnumerable` is a generic interface for enumerable types (collections), `List` inherits this interface. So yes, `List` is `IEnumerable` (T is string in this case). – Aidiakapi Jul 11 '12 at 02:08
@Darkshadw. Yes. `List` is `IEnumerable` which you can easily see if you click F12 to go to definition of List object (or [RTFM](http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx)) – Alexei Levenkov Jul 11 '12 at 02:10
@AlexeiLevenkov Thanks, and I got a bit confused, I thought it was using the WebBrowser control, which is obviously not true. I suppose `WebClient` is just a fancy wrapper around my code :P. (P.S. +1) – Aidiakapi Jul 11 '12 at 02:10

Downloading multiple files WebClient

2 Answers2

Linked