4

I have a few different ways of upload entire directories to Amazon S3 within my application depending on what options are selected. Currently one of the options will perform an upload of multiple directories in parallel. I'm not sure if this is a good idea as in some cases it sped up the upload and other cases it slowed it down. The speed up appears to be when there are a bunch of small directories, but it slows down if there are large directories in the batch. I'm using the parallel ForEach loop seen below and utilizing the AWS API's TransferUtility.UploadDirectoryAsync() method as such:

Parallel.ForEach(dirs,myParallelOptions, 
                   async dir => { await MyUploadMethodAsync(dir) };

Where the TransferUtility.UploadDirectoryAsync() method is within MyUploadMethodAsync(). The TransferUtility's upload methods all perform parallel uploads of parts a single file (if the size is big enough to do so), so performing a parallel upload of the directory as well may be overkill. Obviously we are still limited to the amount of bandwidth available so this might be a waste and I just should just use a regular foreach loop with the UploadDirectoryAsync() method. Can anyone provide some insight on if this is bad case for parallelization?

svick
  • 236,525
  • 50
  • 385
  • 514
JNYRanger
  • 6,829
  • 12
  • 53
  • 81

1 Answers1

5

Did you actually test this? The way you're using it, Parallel.ForEach may return well before any of MyUploadMethodAsync is completed, because of the async lambda:

Parallel.ForEach(dirs,myParallelOptions, 
    async dir => { await MyUploadMethodAsync(dir) };

Parallel.ForEach is suited for CPU-bound tasks. For IO-bound tasks, you are probably looking for something like this:

var tasks = dirs.Select(dir => MyUploadMethodAsync(dir));
await Task.WhenAll(tasks);
// or Task.WaitAll(tasks) if you need a blocking wait
noseratio
  • 59,932
  • 34
  • 208
  • 486
  • I did test it a few times, but as I mentioned in my post I had mixed results depending on the directory sizes. You are absolutely correct that the `Parallel.ForEach` returned prior to each of the `MyUploadMethodAsync` methods. I saw this in my tracing, but that was expected since they're async operations. I'll try this version and see if it improves performance. – JNYRanger Jan 22 '14 at 15:43
  • @JNYRanger, what you're passing to `Parallel.ForEach` is `async void` lambda, not even `async Task` lambda. When `Parallel.ForEach` returns, you have no way of tracking the completion status, check the result and handle any exceptions possibly thrown by each `MyUploadMethodAsync` (and most of them will still be pending). – noseratio Jan 22 '14 at 18:20
  • 1
    Thanks a lot, this did the trick. I was able to not only handle errors better, but improved performance by over 25%! I went against my own rules making `async void` methods. Never again. – JNYRanger Jan 22 '14 at 19:52
  • from this post http://stackoverflow.com/questions/8505815/how-to-properly-parallelise-job-heavily-relying-on-i-o TaskContinuationOptions.AttachedToParent will solve your other problem – Gomes Aug 26 '16 at 15:21