13

Amazingly, using PLINQ did not yield benefits on a small test case I created; in fact, it was even worse than usual LINQ.

Here's the test code:

    int repeatedCount = 10000000;
    private void button1_Click(object sender, EventArgs e)
    {
        var currTime = DateTime.Now;
        var strList = Enumerable.Repeat(10, repeatedCount);
        var result = strList.AsParallel().Sum();

        var currTime2 = DateTime.Now;
        textBox1.Text = (currTime2.Ticks-currTime.Ticks).ToString();

    }

    private void button2_Click(object sender, EventArgs e)
    {
        var currTime = DateTime.Now;
        var strList = Enumerable.Repeat(10, repeatedCount);
        var result = strList.Sum();

        var currTime2 = DateTime.Now;
        textBox2.Text = (currTime2.Ticks - currTime.Ticks).ToString();
    }

The result?

textbox1: 3437500
textbox2: 781250

So, LINQ is taking less time than PLINQ to complete a similar operation!

What am I doing wrong? Or is there a twist that I don't know about?

Edit: I've updated my code to use stopwatch, and yet, the same behavior persisted. To discount the effect of JIT, I actually tried a few times with clicking both button1 and button2 and in no particular order. Although the time I got might be different, but the qualitative behavior remained: PLINQ was indeed slower in this case.

Graviton
  • 81,782
  • 146
  • 424
  • 602
  • 4
    Tip: Use the `Stopwatch` class for measuring performance. It will be more accurate for measuring time periods than `DateTime.Now`. – Greg Jul 28 '10 at 15:33
  • 2
    @Anthony Pegram, how so? I can easily see what every type will be. – CaffGeek Jul 28 '10 at 15:40
  • Could you let us know what hardware you're running this on? – Alex Humphrey Jul 28 '10 at 15:53
  • 1
    @Anthony, sorry for the bad naming; it was just an example I conjured out of thin air, a simple one, so I would expect that anyone who read it could understand it in a first glance. Nothing rocket science here. – Graviton Jul 28 '10 at 15:54
  • What machine are you running the test on? How many cores? – Kennet Belenky Jul 28 '10 at 16:01
  • 4
    @Anthony certainly not the People's exhibit A since var is practically impossible to abuse. If you name your variables with sense you don't need to know the type to understand the code. (If in doubt try reading F# code it's very readable and you can only use the equivalent of var (almost at least)) – Rune FS Jul 28 '10 at 16:05
  • @Kennet, Intel core i5. 4 cores – Graviton Jul 28 '10 at 16:18
  • ran it in mono, plinq was faster – Cui Pengfei 崔鹏飞 Oct 25 '15 at 08:58
  • I've also faced this scenario, you must go through pitfalls of PLINQ at https://msdn.microsoft.com/en-us/library/dd997403(v=vs.110).aspx . It is normal/expected in many scenario for PLINQ to be slower than LINQ, so it is always better to measure time taken and then only decide to go for LINQ or PLINQ – techExplorer Mar 10 '16 at 08:21

9 Answers9

23

First: Stop using DateTime to measure run time. Use a Stopwatch instead. The test code would look like:

var watch = new Stopwatch();

var strList = Enumerable.Repeat(10, 10000000);

watch.Start();
var result = strList.Sum();
watch.Stop();

Console.WriteLine("Linear: {0}", watch.ElapsedMilliseconds);

watch.Reset();

watch.Start();
var parallelResult = strList.AsParallel().Sum();
watch.Stop();

Console.WriteLine("Parallel: {0}", watch.ElapsedMilliseconds);

Console.ReadKey();

Second: Running things in Parallel adds overhead. In this case, PLINQ has to figure out the best way to divide your collection so that it can Sum the elements safely in parallel. After that, you need to join the results from the various threads created and Sum those as well. This isn't a trivial task.

Using the code above I can see that using Sum() nets a ~95ms call. Calling .AsParallel().Sum() nets around ~185ms.

Doing a task in Parallel is only a good idea if you gain something by doing it. In this case, Sum is a simple enough task that you don't gain by using PLINQ.

Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
  • 2
    +1 @Justin Niessner Very Crisp answer. @Ngu Soon Hui PLINQ will be handy, but if you don't understand it well it'll be evil as well. I would suggest you to see Cache False Sharing and GC Throttling before start playing with PLINQ. – Prince Ashitaka Oct 18 '10 at 09:34
  • Apparently inability of the system to recognize the fact that using PLINQ is slower contradicts to the statement in this post http://stackoverflow.com/questions/2893637/is-it-ok-to-try-to-use-plinq-in-all-linq-queries that the system is able to pick LINQ or PLINQ depending on which one is faster. – Schultz9999 Nov 15 '11 at 05:27
  • @CuiPengFei - That very well may be the case. There's no guarantee that each platform/machine runs the code with the same performance. – Justin Niessner Oct 26 '15 at 01:18
22

This is a classic mistake -- thinking, "I'll run a simple test to compare the performance of this single-threaded code with this multi-threaded code."

A simple test is the worst kind of test you can run to measure multi-threaded performance.

Typically, parallelizing some operation yields a performance benefit when the steps you're parallelizing require substantial work. When the steps are simple -- as in, quick* -- the overhead of parallelizing your work ends up dwarfing the miniscule performance gain you would have otherwise gotten.


Consider this analogy.

You're constructing a building. If you have one worker, he has to lay bricks one by one until he's made one wall, then do the same for the next wall, and so on until all walls are built and connected. This is a slow and laborious task that could benefit from parallelization.

The right way to do this would be to parallelize the wall building -- hire, say, 3 more workers, and have each worker construct his own wall so that 4 walls can be built simultaneously. The time it takes to find the 3 extra workers and assign them their tasks is insignificant in comparison to the savings you get by getting 4 walls up in the amount of time it would have previously taken to build 1.

The wrong way to do it would be to parallelize the brick laying -- hire about a thousand more workers and have each worker responsible for laying a single brick at a time. You may think, "If one worker can lay 2 bricks per minute, then a thousand workers should be able to lay 2000 bricks per minute, so I'll finish this job in no time!" But the reality is that by parallelizing your workload at such a microscopic level, you're wasting a tremendous amount of energy gathering and coordinating all of your workers, assigning tasks to them ("lay this brick right there"), making sure no one's work is interfering with anyone else's, etc.

So the moral of this analogy is: in general, use parallelization to split up the substantial units of work (like walls), but leave the insubstantial units (like bricks) to be handled in the usual sequential manner.


*For this reason, you can actually make a pretty good approximation of the performance gain of parallelization in a more work-intensive context by taking any fast-executing code and adding Thread.Sleep(100) (or some other random number) to the end of it. Suddenly sequential executions of this code will be slowed down by 100 ms per iteration, while parallel executions will be slowed significantly less.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
9

Others have pointed out some flaws in your benchmarks. Here's a short console app to make it simpler:

using System;
using System.Diagnostics;
using System.Linq;

public class Test
{
    const int Iterations = 1000000000;

    static void Main()
    {
        // Make sure everything's JITted
        Time(Sequential, 1);
        Time(Parallel, 1);
        Time(Parallel2, 1);
        // Now run the real tests
        Time(Sequential, Iterations);
        Time(Parallel,   Iterations);
        Time(Parallel2,  Iterations);
    }

    static void Time(Func<int, int> action, int count)
    {
        GC.Collect();
        Stopwatch sw = Stopwatch.StartNew();
        int check = action(count);
        if (count != check)
        {
            Console.WriteLine("Check for {0} failed!", action.Method.Name);
        }
        sw.Stop();
        Console.WriteLine("Time for {0} with count={1}: {2}ms",
                          action.Method.Name, count,
                          (long) sw.ElapsedMilliseconds);
    }

    static int Sequential(int count)
    {
        var strList = Enumerable.Repeat(1, count);
        return strList.Sum();
    }

    static int Parallel(int count)
    {
        var strList = Enumerable.Repeat(1, count);
        return strList.AsParallel().Sum();
    }

    static int Parallel2(int count)
    {
        var strList = ParallelEnumerable.Repeat(1, count);
        return strList.Sum();
    }
}

Compilation:

csc /o+ /debug- Test.cs

Results on my quad core i7 laptop; runs up to 2 cores fast, or 4 cores more slowly. Basically ParallelEnumerable.Repeat wins, followed by the sequence version, followed by parallelising the normal Enumerable.Repeat.

Time for Sequential with count=1: 117ms
Time for Parallel with count=1: 181ms
Time for Parallel2 with count=1: 12ms
Time for Sequential with count=1000000000: 9152ms
Time for Parallel with count=1000000000: 44144ms
Time for Parallel2 with count=1000000000: 3154ms

Note that earlier versions of this answer were embarrassingly flawed by having the wrong number of elements - I'm much more confident in the results above.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I don't see any fundamental diff between your benchmark and mine, but why our results are substantially different? – Graviton Jul 28 '10 at 15:52
  • 1
    @Ngu: Are you running your code under the debugger or using the Intellitrace feature of VS2010? Those can dramatically skew results for performance timings. Ideally, try running a release build on the command line - not from VS. – LBushkin Jul 28 '10 at 15:55
  • 1
    @Ngu: 1) Mine's not running in the context of a UI. I don't know if that makes any difference, but why give the benchmark more work to do? 2) Mine is clearly removing the JIT from the equation. 3) Mine gives the tasks more work to do (by a factor of 100). The Stopwatch/DateTime difference is a *bit* of a red herring, IMO - by the time you've got reasonably long time periods involved (multiple seconds) the imprecision of DateTime is going to be insignificant. – Jon Skeet Jul 28 '10 at 15:55
  • Interesting. I added some code to make sure everything was JIT'd in my console app as well and on my Core 2 Duo P8600, AsParallel is still running nearly twice as slow. Then I tried your console app and saw the same results as using mine... – Justin Niessner Jul 28 '10 at 15:58
  • @Justin: Do you mean my app is showing slower Parallel results on your box? What times are you getting? – Jon Skeet Jul 28 '10 at 16:06
  • @Jon Skeet - Your app and my app had similar results. For yours I was seeing 9.3 second sequential times vs. 22.2 second parallel. – Justin Niessner Jul 28 '10 at 16:11
  • @Jon, I tested your code and indeed the result was the same as yours-- PLINQ did run faster on your benchmark – Graviton Jul 28 '10 at 16:11
  • @Ngu: So now it's just Justin's machine we're confused by :) Btw, I've added an extra benchmark for `ParallelEnumerable.Repeat`. – Jon Skeet Jul 28 '10 at 16:13
  • @Jon, you said that **Mine is clearly removing the JIT from the equation**, may I know how this was actually done on your code? – Graviton Jul 28 '10 at 16:13
  • @Justin: What did your processor usage look like when running the benchmarks? Was it using all cores? – Jon Skeet Jul 28 '10 at 16:14
  • @Ngu: It's in the comments: run it once with a tiny count just to make sure all the code has been JITted, then run it again with the real load. – Jon Skeet Jul 28 '10 at 16:14
  • @Jon Skeet - All cores pegged at 100% for the Parallel test. For the sequential the first core hit about 50% utilization. – Justin Niessner Jul 28 '10 at 16:17
  • @Jon, I tested your code with `GC.Collected()`, `Time(Sequential,1)` and `Time(Parallel,1)` commented, yet your result still stood. Seems to me that it has nothing to do with JIT and the sorts. – Graviton Jul 28 '10 at 16:17
  • @Ngu: In this case it might not - these are just things I do by default when benchmarking, to reduce the chances of interference. – Jon Skeet Jul 28 '10 at 16:21
  • @Jon, your benchmark has a bug; for `Sequential` method there are 1000000000 elements, for `Parallel` method there are only `100000000` elements, in other words the elements in the `Parallel` method is only 1/10 of the `Sequential`! If both has the same element count, the `Sequential` method still runs faster, regardless of whether it's in my code or your code. – Graviton Aug 07 '10 at 03:08
  • Anyway `ParallelEnumerable.Repeat` really outperforms the rest. But I'm not sure how I can use it. – Graviton Aug 07 '10 at 03:12
  • @Ngu: Yikes! What an embarrassing find! Will edit it and leave the results. – Jon Skeet Aug 07 '10 at 08:37
1

Is it possible you are not taking into account JIT time? You should run your test twice and discard the first set of results.

Also, you shouldn't use DateTime to get performance timing, use the Stopwatch class instead:

var swatch = new Stopwatch();
swatch.StartNew();

var strList = Enumerable.Repeat(10, repeatedCount); 
var result = strList.AsParallel().Sum(); 

swatch.Stop();
textBox1.Text = swatch.Elapsed;

PLINQ does add some overhead to the processing of a sequence. But the magnitute difference in your case seems excessive. PLINQ makes sense when the overhead cost is outweighed by the benefit of running the logic on multiple cores/CPUs. If you don't have multiple core, running processing in parallel offers no real advantage - and PLINQ should detect such a case and perform the processing sequentially.

EDIT: When creating embedded performance tests of this kind, you should make sure that you are not running them under the debugger, or with Intellitrace enabled, as those can significantly skew performance timings.

LBushkin
  • 129,300
  • 32
  • 216
  • 265
1

Something more important that I didn't see mentioned is that .AsParallel will have different performance depending on the collection used.

In my tests PLINQ is faster than LINQ when NOT used on IEnumerable (Enumerable.Repeat) :

  29ms  PLINQ  ParralelQuery    
  30ms   LINQ  ParralelQuery    
  30ms  PLINQ  Array
  38ms  PLINQ  List    
 163ms   LINQ  IEnumerable
 211ms   LINQ  Array
 213ms   LINQ  List
 273ms  PLINQ  IEnumerable
4 processors

Code is in VB, but provided to show that using .ToArray made the PLINQ version few times faster

    Dim test = Function(LINQ As Action, PLINQ As Action, type As String)
                   Dim sw1 = Stopwatch.StartNew : LINQ() : Dim ts1 = sw1.ElapsedMilliseconds
                   Dim sw2 = Stopwatch.StartNew : PLINQ() : Dim ts2 = sw2.ElapsedMilliseconds
                   Return {String.Format("{0,4}ms   LINQ  {1}", ts1, type), String.Format("{0,4}ms  PLINQ  {1}", ts2, type)}
               End Function

    Dim results = New List(Of String) From {Environment.ProcessorCount & " processors"}
    Dim count = 12345678, iList = Enumerable.Repeat(1, count)

    With iList : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "IEnumerable")) : End With
    With iList.ToArray : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "Array")) : End With
    With iList.ToList : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "List")) : End With
    With ParallelEnumerable.Repeat(1, count) : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "ParralelQuery")) : End With

    MessageBox.Show(String.join(Environment.NewLine, From l In results Order By l))

Running the tests in different order will have a bit different results, so having them in one line makes moving them up and down a bit easier for me.

Slai
  • 22,144
  • 5
  • 45
  • 53
0

That indeed may be the case because you are increasing the number of context switches and you are not performing any action that would benefit of having threads waiting for something like i/o completion. This is going to be even worse if you are running in a single cpu box.

Otávio Décio
  • 73,752
  • 17
  • 161
  • 228
0

I'd recommend using the Stopwatch class for timing metrics. In your case it's a better measure of the interval.

Matthew Sposato
  • 1,635
  • 1
  • 11
  • 13
0

Please read the Side Effects section of this article.

http://msdn.microsoft.com/en-us/magazine/cc163329.aspx

I think you can run into many conditions where PLINQ has additional data processing patterns you must understand before you opt to think that is will always purely have faster response times.

apolfj
  • 940
  • 3
  • 14
  • 27
0

Justin's comment about overhead is exactly right.

Just something to consider when writing concurrent software in general, beyond the use of PLINQ:

You always need to be thinking about the "granularity" of your work items. Some problems are very well suited to parallelization because they can be "chunked" at a very high level, like raytracing entire frames concurrently (these sorts of problems are called embarrassingly parallel). When there are very large "chunks" of work, then the overhead of creating and managing multiple threads becomes negligible compared to the actual work that you want to get done.

PLINQ makes concurrent programming easier, but it doesn't mean that you can ignore thinking about the granularity of your work.

Matt
  • 4,318
  • 1
  • 27
  • 28