9

I'm trying to understand why Parallel.For is able to outperform a number of threads in the following scenario: consider a batch of jobs that can be processed in parallel. While processing these jobs, new work may be added, which then needs to be processed as well. The Parallel.For solution would look as follows:

var jobs = new List<Job> { firstJob };
int startIdx = 0, endIdx = jobs.Count;
while (startIdx < endIdx) {
  Parallel.For(startIdx, endIdx, i => WorkJob(jobs[i]));
  startIdx = endIdx; endIdx = jobs.Count;
}

This means that there are multiple times where the Parallel.For needs to synchronize. Consider a bread-first graph algorithm algorithm; the number of synchronizations would be quite large. Waste of time, no?

Trying the same in the old-fashioned threading approach:

var queue = new ConcurrentQueue<Job> { firstJob };
var threads = new List<Thread>();
var waitHandle = new AutoResetEvent(false);
int numBusy = 0;
for (int i = 0; i < maxThreads; i++) 
  threads.Add(new Thread(new ThreadStart(delegate {
    while (!queue.IsEmpty || numBusy > 0) {
      if (queue.IsEmpty)
        // numbusy > 0 implies more data may arrive
        waitHandle.WaitOne();

      Job job;
      if (queue.TryDequeue(out job)) {
        Interlocked.Increment(ref numBusy);
        WorkJob(job); // WorkJob does a waitHandle.Set() when more work was found
        Interlocked.Decrement(ref numBusy);
      }
    }
    // others are possibly waiting for us to enable more work which won't happen
    waitHandle.Set(); 
})));
threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());

The Parallel.For code is of course much cleaner, but what I cannot comprehend, it's even faster as well! Is the task scheduler just that good? The synchronizations were eleminated, there's no busy waiting, yet the threaded approach is consistently slower (for me). What's going on? Can the threading approach be made faster?

Edit: thanks for all the answers, I wish I could pick multiple ones. I chose to go with the one that also shows an actual possible improvement.

Frank Razenberg
  • 511
  • 4
  • 17
  • 1
    Why would you want to try and make it faster if there is already a cleaner faster solution? – iMortalitySX Oct 25 '12 at 14:04
  • Because there's an obvious deficiency that can be eliminated, I think. – Frank Razenberg Oct 25 '12 at 14:05
  • Close question [Is PLinq Inherently Faster than System.Threading.Tasks.Parallel.ForEach](http://stackoverflow.com/questions/5196293/is-plinq-inherently-faster-than-system-threading-tasks-parallel-foreach) – iMortalitySX Oct 25 '12 at 14:06
  • 2
    You do realize the two snippets of code are as different as day and night, right? One one hand you have a parallel loop over a static, well known number of list jobs (between startIdx and endIdx) and on the other hand you have a a set of threads racing against a queue and a waitable event, with a couple of interlocked spread in between. In other words, the threads code is a [cache invalidation galore](http://www.aristeia.com/TalkNotes/ACCU2011_CPUCaches.pdf). – Remus Rusanu Oct 25 '12 at 14:14

3 Answers3

14

The two code samples are not really the same.

The Parallel.ForEach() will use a limited amount of threads and re-use them. The 2nd sample is already starting way behind by having to create a number of threads. That takes time.

And what is the value of maxThreads ? Very critical, in Parallel.ForEach() it is dynamic.

Is the task scheduler just that good?

It is pretty good. And TPL uses work-stealing and other adaptive technologies. You'll have a hard time to do any better.

H H
  • 263,252
  • 30
  • 330
  • 514
4

Parallel.For doesn't actually break up the items into single units of work. It breaks up all the work (early on) based on the number of threads it plans to use and the number of iterations to be executed. Then has each thread synchronously process that batch (possibly using work stealing or saving some extra items to load-balance near the end). By using this approach the worker threads are virtually never waiting on each other, while your threads are constantly waiting on each other due to the heavy synchronization you're using before/after every single iteration.

On top of that since it's using thread pool threads many of the threads it needs are likely already created, which is another advantage in its favor.

As for synchronization, the entire point of a Parallel.For is that all of the iterations can be done in parallel, so there is almost no synchronization that needs to take place (at least in their code).

Then of course there is the issue of the number of threads. The threadpool has a lot of very good algorithms and heuristics to help it determine how many threads are need at that instant in time, based on the current hardware, the load from other applications, etc. It's possible that you're using too many, or not enough threads.

Also, since the number of items that you have isn't known before you start I would suggest using Parallel.ForEach rather than several Parallel.For loops. It is simply designed for the situation that you're in, so it's heuristics will apply better. (It also makes for even cleaner code.)

BlockingCollection<Job> queue = new BlockingCollection<Job>();

//add jobs to queue, possibly in another thread
//call queue.CompleteAdding() when there are no more jobs to run

Parallel.ForEach(queue.GetConsumingEnumerable(),
    job => job.DoWork());
Servy
  • 202,030
  • 26
  • 332
  • 449
  • Actually it seems that approach is not possible, since you don't know when to call `queue.CompleteAdding()`. This is only when the queue is both empty and nobody is working on any more items. – Frank Razenberg Oct 25 '12 at 14:27
  • @FrankRazenberg Nope. You just call `CompleteAdding` when there are no more items to add. You don't need to wait for it to be empty or for no more items to be in the process of being worked on. `BlockingCollection` will already take care of that. `CompleteAdding` will simply mean that the enumerator won't add any more items to it's internal collection so when it does, eventually, spit out the last one it should break, rather than blocking and waiting for more items. – Servy Oct 25 '12 at 14:30
  • But how will you know when/where to call CompleteAdding()? It can only be called once, right? – Frank Razenberg Oct 25 '12 at 14:31
  • @FrankRazenberg You will call `CompleteAdding` when you have no more items to add. It seems that your processing of a job is what adds more jobs; if that's the case then I would suggest refactoring it into a producer and a consumer if at all possible. Have a single thread/task that just creates `Job` items to process and adds them to the blocking collection and then the code that I've shown can do the actual processing for each job. – Servy Oct 25 '12 at 14:49
  • Only once all jobs are done processing we can know if there is not gonna be any more work (i.e. the while clause of my thread-spawning approach). I guess this is not possible then :( – Frank Razenberg Oct 25 '12 at 15:08
2

Your creating a bunch of new threads and the Parallel.For is using a Threadpool. You'll see better performance if you were utilizing the C# threadpool but there really is no point in doing that.

I would shy away from rolling out your own solution; if there is a corner case where you need customization use the TPL and customize..

Ayubinator
  • 148
  • 1
  • 4