1

I need to use WebClient in a project to split a file into multiple parts and upload them in parallel. So far, I'm able to upload the parts one at a time, but am unsure as to how to upload them in parallel.

I have an UploadPart method which looks like this:

private async Task<PartETag> UploadPart(string filePath, string preSignedUrl, int partNumber)
{
    WebClient wc = new();
    wc.UploadProgressChanged += WebClientUploadProgressChanged;
    wc.UploadFileCompleted += WebClientUploadCompleted;
    _ = await wc.UploadFileTaskAsync(new Uri(preSignedUrl), "PUT", filePath);

    // Obtain the WebHeaderCollection instance containing the header name/value pair from the response.
    WebHeaderCollection myWebHeaderCollection = wc.ResponseHeaders;
    string formattedETag = myWebHeaderCollection.GetValues("ETag").FirstOrDefault().Replace(@"""", "");
    PartETag partETag = new(partNumber, formattedETag);

    return partETag;
}

Its called inside a foreach loop:

foreach (var part in parts)
{
    var partETag = await UploadPart(part.FilePath, part.PresignedUrl, part.Number);
    partETags.Add(partETag);
}

How can I modify this so that I upload parts in parallel (up to a max of 10 parts at once) while still returning the PartETag values in the response header?

JMR
  • 37
  • 1
  • 5
  • Any reason why you _"need to use WebClient"_? If you look at its docs, it clearly says : _"We don't recommend that you use the `WebClient` class for new development. Instead, use the `System.Net.Http.HttpClient` class"_ https://learn.microsoft.com/en-us/dotnet/api/system.net.webclient?view=net-5.0#remarks – Flydog57 Jun 29 '21 at 16:35
  • I'm making modifications to an existing codebase which uses WebClient already, and need to update it so it supports uploading multipart files in parallel. – JMR Jun 29 '21 at 16:44
  • In your foreach loop, you start an `UploadPart` task and `await` it. That will do them one at a time. Consider using `Parallel.Foreach` or just starting them all up and awaiting a call to `Task.WhenAll`. If you want to limit it to 10 at a time, I _think_ you can do that with with `Parallel.Foreach`. Doing it by hand would be more work. – Flydog57 Jun 29 '21 at 16:48
  • Check out `Task.WhenAll` https://stackoverflow.com/questions/17197699/awaiting-multiple-tasks-with-different-results – abdusco Jun 29 '21 at 16:50

1 Answers1

1

This is a perfect scenario for TPL Dataflow:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

var parts = new List<Part>();
var partEtags = new List<PartETag>();

var transformBlock = new TransformBlock<Part, PartETag>
(
    async part => await UploadPart(part.FilePath, part.PreSignedUrl, part.PartNumber),
    new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 10}
);

var actionBlock = new ActionBlock<PartETag>(partETag => partEtags.Add(partETag));

transformBlock.LinkTo(actionBlock, new DataflowLinkOptions {PropagateCompletion = true});

foreach (Part part in parts)
{
    transformBlock.Post(part);
}

transformBlock.Complete();

await actionBlock.Completion;

I made some assumptions about your classes since you didn't show all of your code. The parts list at the top obviously needs to have instances in it.

This code creates a data flow that does the work asynchronously and caps the parallel executions to 10. The blocks are linked with completion propagated so we await the completion of the action block to make sure everything finishes.

Once that's done, your partEtags list will contain all of your results.

David Peden
  • 17,596
  • 6
  • 52
  • 72