0

Can someone help me understand why this code might hang? It seems to work most of the time, but will occasionally get stuck. I am unclear if it is on the server side, or the client side. Taking a dump of the stuck process, it appears that it is waiting on this line:

await download.CopyToAsync(fileStream).ConfigureAwait(false);

I am also unclear if I should be limiting the number of httpclient requests I make. This code will be called with up to about 200 files in the array sourceUris, and several of these programs will be running at once on the computer. If I need to time out, what is the best way to do that? Do I need to check the download progress and time out only if we are not making progress?

    public static async Task<bool> DownloadFilesAsync(
        HttpClient httpClient,
        string[] sourceUris,
        string[] destinationFilePaths)
    {
        List<Task<bool>> downloadTasks = new List<Task<bool>>();
        for (int index = 0; index < sourceUris.Length; index++)
        {
            downloadTasks.Add(DownloadFile(httpClient, sourceUris[index], destinationFilePaths[index]));
        }

        await Task.WhenAll(downloadTasks).ConfigureAwait(false);

        return success;
    }        

    private static Task<bool> DownloadFile(HttpClient httpClient, string downloadUrl, string destinationPath)
    {
        return Task.Run(async () =>
        {
            return await DownloadFileAsync(httpClient, downloadUrl, destinationPath);
        });
    }

    private static async Task<bool> DownloadFileAsync(HttpClient client, string downloadUrl, string destinationPath)
    {
        string uniqueDownloadUrl = downloadUrl;
        try
        {
            Directory.CreateDirectory(Path.GetDirectoryName(destinationPath));

            Guid uniqueId = new Guid();

            await RetryHelper.RunWithRetryAsync(async () =>
            {
                // Append a guid to the download url for easy diagnosis in the IIS logs
                uniqueId = Guid.NewGuid();
                uniqueDownloadUrl += $"&uniqueId={uniqueId}";

                await client.DownloadFileTaskAsync(uniqueDownloadUrl, destinationPath).ConfigureAwait(false);
                if (!File.Exists(destinationPath))
                {
                    throw new FileNotFoundException($"File did not download successfully. File does not exist at its destination path of {destinationPath}");
                }
            },
            waitInterval: TimeSpan.FromSeconds(5),
            maxAttempts: 5,
            onErrorHandled: (exception, retryMessage) =>
            {
                // Custom error handling here
            },
            runWithExponentialBackoff: true).ConfigureAwait(false);
        }catch (Exception e)
        {
            // Log a failure message here
            return false;
        }

        return true;
    }        

    public static async Task DownloadFileTaskAsync(this HttpClient client, string uri, string FileName)
    {
        var response = await client.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead, default).ConfigureAwait(false);
        if (!response.IsSuccessStatusCode)
        {
            throw new WebException($"Response status code does not indicate success :{response.StatusCode}");
        }

        await DownloadHttpContentAsync(response.Content, FileName).ConfigureAwait(false);
    }

    public static async Task DownloadHttpContentAsync(HttpContent httpContent, string FileName)
    {
        var contentLength = httpContent.Headers.ContentLength;
        using (var download = await httpContent.ReadAsStreamAsync().ConfigureAwait(false))
        using (FileStream fileStream = new FileStream(FileName, FileMode.Create))
        {
            await download.CopyToAsync(fileStream).ConfigureAwait(false);
        }
    }
user2460953
  • 319
  • 1
  • 3
  • 10
  • 1
    You did all code on Download, but none off the CopyToAsync code... what does that do? https://stackoverflow.com/help/minimal-reproducible-example – riffnl Oct 28 '22 at 00:15
  • Also, how is this code called? Which one of these several methods is the one in the call stack? And by "stall", do you mean it is just slower than normal for a bit, or do you mean it never completes? – Stephen Cleary Oct 28 '22 at 00:19
  • https://learn.microsoft.com/en-us/dotnet/api/system.io.stream.copytoasync?view=net-6.0#system-io-stream-copytoasync(system-io-stream) – user2460953 Oct 28 '22 at 07:13
  • Copytoasync is a framework method, and the top function, download files async is the one called. It starts out at several hundred mbps, then levels off at 50 mbps for a few minutes, then goes to zero and never recovers. At this point I think it might be a server issue. I tried experimenting with how many requests are sent at the same time. I generally have about 10-15 processes running that call this method at the same time . If I only download two files concurrently in each process, it finishes, although some of them slowly . If I spawn 15 per process (processor count), then some stall. – user2460953 Oct 28 '22 at 07:23

2 Answers2

0

If you think the concurrent connections is the problem, you can implement something like this to throttle the httpClient to 32 (or whatever) max connections. I'd avoid doing more than the number of cores your CPU has though.

// Max 32 concurrent connections
private static SemaphoreSlim concurrencyThrottler = new SemaphoreSlim(32);
private static Task<bool> DownloadFile(HttpClient httpClient, string downloadUrl, string destinationPath)
{        
    try
    {
        await concurrencyThrottler.WaitAsync();
        return await DownloadFileAsync(httpClient, downloadUrl, destinationPath);
    }
    finally
    {
        concurrencyThrottler.Release();
    }
}

Edit: Since you wanted a Semaphore example:

    static Semaphore throttler = new Semaphore(4, 4, "fileDownloder");
    public static async Task<bool> DownloadFilesAsync(
        HttpClient httpClient,
        string[] sourceUris,
        string[] destinationFilePaths)
    {
        List<Task<bool>> downloadTasks = new List<Task<bool>>();
        for (int index = 0; index < sourceUris.Length; index++)
        {
            throttler.WaitOne();
            try
            {
                return await DownloadFileAsync(httpClient, sourceUris[index], destinationFilePaths[index]);
            }
            finally
            {
                throttler.Release();
            }
        }
        return (await Task.WhenAll(downloadTasks).ConfigureAwait(false)).ToArray().All(t => t);
    }  
corylulu
  • 3,449
  • 1
  • 19
  • 35