Using Parallel.Foreach in a small azure instance

Question

I have a WebRole running on a small instance. This WebRole has a method that uploads a large amount of files to BLOB storage. According to the Azure instances specs, a small instance has only 1 core. So when uploading those blobs, will Parallel.Foreach give me any benefits over a regular Foreach ?

Drew Marsh · Accepted Answer · 2011-12-09T18:51:24.313

You would be much better served by focusing on using the aysnc versions of the blob storage APIs and/or Stream APIs so that you are I/O bound rather than CPU bound. Anywhere there is a BeginXXX API you should use it by wrapping it up with Task.Factory.FromAsync and the using a continuation from there. In your specific case you should leverage CloudBlob.BeginUploadFromStream. How you get the stream initially is just as important so look for async APIs on that end too.

The only thing that may hold you back from using a small instance after that is that it's capped at 100Mbps where as medium is 200Mbps. Then again you can always leverage the elasticity factor and increase role count when you need more processing and scale back again when things calm down.

Here's an example of how you would call BeginUploadFromStream using FromAsync. Now, as far as coordinating concurrent processing, since you're now kicking off async tasks you can't count on Parallel::ForEach to constrain the max concurrency for you. This means you will just have a regular foreach on the original thread with a Semaphore to limit concurrency. This will provide the equivalent of MaxDegreeOfParallelism:

// Setup a semaphore to constrain the max # of concurrent "thing"s we will process
int maxConcurrency = ... read from config ...
Semaphore maxConcurrentThingsToProcess = new Semaphore(maxConcurrency, maxConcurrency);

// Current thread will enumerate and dispatch I/O work async, this will be the only CPU resource we're holding during the async I/O
foreach(Thing thing in myThings)
{
    // Make sure we haven't reached max concurrency yet
    maxConcurrentThingsToProcess.WaitOne();

    try
    {
        Stream mySourceStream = ... get the source stream from somewhere ...;
        CloudBlob myCloudBlob = ... get the blob from somewhere ...;

        // Begin uploading the stream asynchronously
        Task uploadStreamTask = Task.Factory.FromAsync(
            myCloudBlob.BeginUploadFromStream,
            myCloudBlob.EndUploadFromStream,
            mySourceStream,
            null);

        // Setup a continuation that will fire when the upload completes (regardless of success or failure)
        uploadStreamTask.ContinueWith(uploadStreamAntecedent =>
        {
            try
            {
                // upload completed here, do any cleanup/post processing
            }
            finally
            {
                // Release the semaphore so the next thing can be processed
                maxConcurrentThingsToProcess.Release();
            }
        });
    }
    catch
    {
        // Something went wrong starting to process this "thing", release the semaphore
        maxConcurrentThingsToProcess.Release();

        throw;
    }
}

Now in this sample I am not showing how you should also be getting the source stream asynchronously, but if, for example, you were downloading that stream from a URL someplace else, you would want to kick that off asynchronously as well and chain the starting of the async upload here into a continuation on that.

Believe me, I know this is more code than just doing a simple Parallel::ForEach, but Parallel::ForEach exists to make concurrency for CPU bound tasks easy. When it comes to I/O, using the async APIs is the only way to achieve maximum I/O throughput while minimizing CPU resources.

If I use Parallel.ForEach() , the code in the loop is very straightforward - I just call CloudBlob.UploadFromStream(). But when using the async methods it's less clear. Should I just wrap CloudBlob.BeginUploadStream() in a Task, and inside a regular foreach loop generate a task foreach file, calling Task.WaitAll() at the end of the loop ? (If you can provide a little code snippet that would be great) — Yaron Levi, Dec 09 '11 at 09:21
No, this isn't the same thing. If you make a synchronous call inside the Parallel::ForEach you are blocking that worker thread while the I/O is occurring which ends up consuming precious CPU resources. It's true that using the async pattern requires a little more work, but the benefits are more than worth it especially when I/O is involved. This is why C# 5.0 is adding the async keyword and .NET 4.5's BCL is being redesigned around this new, Task based asynchronous pattern. Even Windows 8 APIs are now following this pattern judiciously. I will add a simple example. — Drew Marsh, Dec 09 '11 at 17:04
I agree with @Drew, if you are looking for maximum parallelism rather than simplicity, because the Parallel::ForEach converts everything into Tasks, and the thread scheduling that then occurs will not immediately allocate a thread for each iteration of your loop. Rather it will assume that your threads each require a non-trivial amount of CPU time, and hence it should start with only a few threads (e.g. 2 per core). Eventually it will work out that all your Tasks are waiting on I/O, and it will allocate more threads, but this will take time. — Oliver Bock, Dec 11 '11 at 22:33
@DrewMarsh I modified your code sample to my needs. This code sits in a method called Upload(). The problem now, is that Upload() is called from a higher method called Process(). Inside Process() I want to wait until Upload() finishes. So in Upload() after the foreach, I call await TaskEx.WhenAll(). But now Process() needs to await Upload(), but it can't (It's a method in a WCF RIA DOMAIN SERVICE.) How can I make Process() to wait until Upload() finishes ? — Yaron Levi, Dec 14 '11 at 08:10
How do i get the following code transformed into the async equal? blob.UploadFromStream(stream, null, new BlobRequestOptions { RetryPolicy = new LinearRetry(TimeSpan.FromMilliseconds(100), 3) }); Not sure how to add the parameters for the begin. — Poul K. Sørensen, Apr 21 '13 at 15:06

score 3 · Answer 2 · answered Dec 08 '11 at 00:56

The number of cores doesn't directly correlate to the number of threads spawned by Parallel.ForEach().

About a year ago, David Aiken did a very informal test with some blob+table access, with and without Parallel.ForEach(), on a Small instance. You can see the results here. In this case, there was a measured improvement, as this was not a CPU-bound activity. I suspect you'll see an improvement in performance as well, since you're uploading a large number of objects to blob storage.

score 3 · Answer 3 · answered Dec 08 '11 at 00:56

Yes it will because each of your uploads will be network bound, so the scheduler can share your single core amongst them. (This after all, is how single-core, single-CPU computers get more than one thing done at a time.)

You could also use the asynchronous blob upload functions for a similar effect.

Using Parallel.Foreach in a small azure instance

3 Answers3

Linked