1

I have a program (.NET 5) that downloads a bunch of files (1k+) simultaneously using WebClient.DownloadFile, which works as expected while running with the debugger, downloading around 99% of the files in both Debug and Release mode; but when running without the debugger it fails to download more than 50% of the files.

All the threads finish before the program ends as they are foreground threads.

The code of the program is:

using System;
using System.IO;
using System.Net;
using System.Threading;

namespace Dumper
{
    internal sealed class Program
    {
        private static void Main(string[] args)
        {
            Directory.CreateDirectory(args[1]);

            foreach (string uri in File.ReadAllLines(args[0]))
            {
                string filePath = Path.Combine(args[1], uri.Split('/')[^1]);

                new Thread((param) =>
                {
                    (string path, string url) = ((string, string))param!;
                    using WebClient webClient = new();

                    try
                    {
                        webClient.DownloadFile(new Uri(url.Replace("%", "%25")), path);

                        Console.WriteLine($"{path} has been successfully download.");
                    }
                    catch (UriFormatException)
                    {
                        throw;
                    }
                    catch (Exception e)
                    {
                        Console.WriteLine($"{path} failed to download: {e}");
                    }
                }).Start((filePath, uri));
            }
        }
    }
}
Kazáni
  • 31
  • 5
  • Fails how? Be specific. – mason Sep 11 '21 at 23:02
  • What exactly do you mean by ‘running without the debugger’? `RELEASE` mode? – stuartd Sep 11 '21 at 23:04
  • 7
    You are not waiting for the threads to complete, so the program is just going to end at the end of `Main`. 1000 threads with 1000 TCP sockets sounds like a bad idea anyway – Charlieface Sep 11 '21 at 23:12
  • 2
    This answer is an example of how to wait for threads to complete: https://stackoverflow.com/a/4190969/1233305 – David784 Sep 11 '21 at 23:35
  • Whether a Thread keeps a program running is controlled by the .IsBackground property. https://learn.microsoft.com/en-us/dotnet/api/system.threading.thread.isbackground?view=net-5.0 These threads are not background threads, so the program will keep running until their thread procs exit. But it's poor form to allow Main() to exit before the program is finished. – David Browne - Microsoft Sep 12 '21 at 00:22
  • 2
    Well then what happens? What exception do you get, do you get a console output? – Charlieface Sep 12 '21 at 00:28
  • The very first comment I asked for specific details how it fails. Please take that to heart and edit the exception details into your question. – mason Sep 12 '21 at 00:33
  • The stack trace you linked to seems to be corrupted. The first column is somehow cut off, which makes your whole copy-paste effort suspect. Please include the complete stack trace of a single exception in your question. – Kevin Krumwiede Sep 12 '21 at 16:01

1 Answers1

3

Your problem has little to do with debugging, however there are many issues with your code in general. Here is a more sane approach which will wait for all the downloads to complete.

Note : You could also use Task.WhenAll, however I have chosen to use a TPL Dataflow ActionBlock in case you need manage the degree of parallelism

Given

private static readonly HttpClient _client = new();

private static string _basePath;

private static async Task ProcessAsync(string input)
{
   try
   {
      var uri = new Uri(Uri.EscapeUriString(input));

      var filePath = Path.Combine(_basePath, input.Split('/')[^1]);

      using var result = await _client
         .GetAsync(uri)
         .ConfigureAwait(false);

      // fail fast
      result.EnsureSuccessStatusCode();

      await using var fileStream = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.None, 1024 * 1024, FileOptions.Asynchronous);

      await using var stream = await result.Content
         .ReadAsStreamAsync()
         .ConfigureAwait(false);

      await stream.CopyToAsync(fileStream)
         .ConfigureAwait(false);

      Console.WriteLine($"Downloaded : {uri}");

   }
   catch (Exception e)
   {
      Console.WriteLine(e);
   }
}

Usage

private static async Task Main(string[] args)
{
   var file = args.ElementAtOrDefault(0) ?? @"D:\test.txt";
   _basePath = args.ElementAtOrDefault(1) ?? @"D:\test";

   Directory.CreateDirectory(_basePath);

   var actionBlock = new ActionBlock<string>(ProcessAsync,new ExecutionDataflowBlockOptions()
   {
      EnsureOrdered = false,
      MaxDegreeOfParallelism = -1 // set this if you think the site is throttling you
   });

   foreach (var uri in File.ReadLines(file))
      await actionBlock.SendAsync(uri);

   actionBlock.Complete();
   // wait to make sure everything is completed
   await actionBlock.Completion;

}
TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • @DonAlex1 You're right about how foreground threads work, but to say your code isn't the best understates how *extremely wrong* creating 1000 threads is. The HTTP calls could be timing out waiting for their thread to be scheduled or something like that. Ideally, you want approximately as many threads as you have CPU cores. Put 1000 URLs in a queue and have a small number of threads work that queue. – Kevin Krumwiede Sep 12 '21 at 16:04
  • Or in fact, don't explicitly create threads at all. Things that are I/O bound should use async/await and use the managed thread pool. Only create your own threads for CPU bound tasks. – Kevin Krumwiede Sep 12 '21 at 16:23
  • I'm approaching this practically. The only explanation you need is, "could be a lot of things, and it's not worth trying to figure out." Do it right and see if the problem still exists before trying to solve it. – Kevin Krumwiede Sep 12 '21 at 16:24