4

I would like to know what is the best way or are there any documents/articles that can help me to identify what is the differences of using Parallel.foreach and Task within a normal for each loop, like the following:

case 1 - Parallel.foreach:

Parallel.foreach
{
  // Do SOmething thread safe: parsing an xml and then save 
  // into a DB Server thry respoitory approach
}

case 2 - Task within foreach:

foreach
{
  Task t1 = Task.factory.startNew(()=>
  {
     //Do the same thing as case 1 that is thread safe
  }
}
Task.waitall()
  • I did do my own tests and the result show case 1 perform way better than case 2. The ratio is about like this: sequential vs case 1 vs case 2 = 5s : 1s : 4s

While there are almost a 1:4 on the case 1 and case 2 ? So is it means we should always use parallel.foreach or parallel.for if we want to run in parallel within the loop?

mting923
  • 431
  • 1
  • 6
  • 15
  • 2
    Boy do I mistrust your test results... Those numbers should ring alarm bells with you. – usr Aug 10 '13 at 16:22
  • @Will - creating a Task is very different from creating a Thread. The _raison d'etre_ for the TPL. – H H Aug 10 '13 at 16:28
  • @ Will, thanks for input, I think I have the same idea, but why the result showing difference? And 4 times... – mting923 Aug 10 '13 at 16:28
  • I don't understand, you seem to be saying that 4s for case 1 is better than 1s for case 2. – svick Aug 10 '13 at 17:17
  • excuse my poor English + lack of exp on posting question here. What I am saying was case 1 just needed to take 1 second to finish the job for each loop while the case 2 was taking about 4seconds to finish for each loop. [Edited Question to fix the confusion] – mting923 Aug 12 '13 at 16:47
  • Possible duplicate of [Parallel.ForEach vs Task.Factory.StartNew](http://stackoverflow.com/questions/5009181/parallel-foreach-vs-task-factory-startnew) – Mohammad Feb 16 '16 at 16:00

3 Answers3

1

What Parallel.ForEach() does is that it creates a small number of Tasks to process iterations of your loop. Tasks are relatively cheap, but they aren't free, so this tends to improve performance. And the body of your loop executes quickly, the improvement can be really big. This is the most likely explanation for the behavior you're observing.

svick
  • 236,525
  • 50
  • 385
  • 514
  • I am not sure I understand the last two sentences.. which implementation are you talking about regarding the 'body of you loop'? parallel.foreach? or task within foreach loop? – mting923 Aug 12 '13 at 16:41
  • @mting923 I mean one iteration of the loop, not the whole `Task`. – svick Aug 12 '13 at 19:41
1

First, the best documentation on the subject is Part V of CLR via C#.

http://www.amazon.com/CLR-via-C-Developer-Reference/dp/0735667454/ref=sr_1_1?ie=UTF8&qid=1376239791&sr=8-1&keywords=clr+via+c%23

Secondly, I would expect the Parallel.Foreach to perform better because it will not only create Tasks, but group them. In Jeffrey Richter's book, he explains that tasks that are started individually, will be put on the thread pool queue. There is some overhead to locking the actual thread pool queue. To combat this, Tasks themselves have their own queue for Tasks that they create. This task sub-queue held by the Tasks can actually do some work without locking!

I would have to read that chapter again (Chapter 27), so I am not sure that Parallel.Foreach works this way, but this is what I would expect it to do.

Locking, he explains, is expensive because it requires accessing a kernel level construct.

In either case, do not expect them to process sequentially. Using Parallel.Foreach is less likely to process sequentially than the foreach keyword due to the aforementioned internals.

Phillip Scott Givens
  • 5,256
  • 4
  • 32
  • 54
  • I havn't begin to read the book yet but I agree with your point about the 'group' feature from parallel.each. In my parallel.each, I do expect a block that would be the critical section but turn out it isn't when I implement it in the parallel.foreach way. I would like to comment more after I finish reading the chapter. Thanks for the recommendation, this book looks very promising on strengthening my knowledge. – mting923 Aug 12 '13 at 16:37
  • I do not think that he talks about the Task queue mechanism directly in the context of Parallel, but it is either in that chapter or the next. – Phillip Scott Givens Aug 12 '13 at 23:02
0

How many tasks are you running? Just the creation of a new task could require a significant amount of time if you're looping enough. i.e., the following runs in 15 ms for the first block, and over 1 sec for the 2nd block, and the 2nd block doesn't even run the task. Uncomment the Start and the time goes up to nearly 3 sec. The WaitAll only adds a small amount.

static class Program
{
    static void Main()
    {
        const int max = 3000000;
        var range = Enumerable.Range(0, max).ToArray();
        {
            var sw = new Stopwatch();
            sw.Start();
            Parallel.ForEach(range, i => { });
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
        {
            var tasks = new Task[max];
            var sw = new Stopwatch();
            sw.Start();
            foreach (var i in range)
            {
                tasks[i] = new Task(()=> { });
                //tasks[i].Start();
            }
            //Task.WaitAll(tasks);
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}
Dax Fohl
  • 10,654
  • 6
  • 46
  • 90
  • Thanks for the sample code, I will try it out before giving my input. But I would also get the average time for each loop to compare. – mting923 Aug 12 '13 at 16:45