Performance Linq

Question

Today I tested the performance impacts of Linq and PLinq querys. Therefore I used the article on msdn How to: Measure PLINQ Query Performance.

void Main()
{
        var source = Enumerable.Range(0, 600000000);
        System.Diagnostics.Stopwatch sw;    

        var queryToMeasure1 = from num in source
                             where num % 3 == 0
                             select Math.Sqrt(num);

        var queryToMeasure2 = from num in source.AsParallel()
                             where num % 3 == 0
                             select Math.Sqrt(num);                          

        long freq = Stopwatch.Frequency;
        Console.WriteLine("Timer frequency in ticks per second = {0}", freq);

        Console.WriteLine("Measuring 1");
        sw = System.Diagnostics.Stopwatch.StartNew();   
        foreach (var n in queryToMeasure1) { }  
        Console.WriteLine("Total ticks: {0} - Elapsed time: {1} ms", sw.ElapsedTicks, sw.ElapsedMilliseconds);      

        Console.WriteLine("Measuring 2");
        sw = System.Diagnostics.Stopwatch.StartNew();   
        foreach (var n in queryToMeasure2) { }  
        Console.WriteLine("Total ticks: {0} - Elapsed time: {1} ms", sw.ElapsedTicks, sw.ElapsedMilliseconds);

        Console.WriteLine("Measuring 3");
        sw = System.Diagnostics.Stopwatch.StartNew();           
        System.Threading.Tasks.Parallel.ForEach(queryToMeasure1, n => {});          
        Console.WriteLine("Total ticks: {0} - Elapsed time: {1} ms", sw.ElapsedTicks, sw.ElapsedMilliseconds);

        Console.WriteLine("Measuring 4");
        sw = System.Diagnostics.Stopwatch.StartNew();           
        System.Threading.Tasks.Parallel.ForEach(queryToMeasure2, n => {});      
        Console.WriteLine("Total ticks: {0} - Elapsed time: {1} ms", sw.ElapsedTicks, sw.ElapsedMilliseconds);

        Console.WriteLine("Measuring 5");
        sw = System.Diagnostics.Stopwatch.StartNew();   
        queryToMeasure2.ForAll(n => {});    
        Console.WriteLine("Total ticks: {0} - Elapsed time: {1} ms", sw.ElapsedTicks, sw.ElapsedMilliseconds););
}

Test environment: LinqPad4 on Win7 Enterprise, 64bit, 8GB RAM, I7-2600 (8cores)

I figured out, and can't explain, why the query on one core (Measurement 1) are faster than the paralleled queries. Do I have to add more select delegates to get benefit from paralleled tasks?

But now the results:

1.Run: with an enumerable range of 60000:

Timer frequency in ticks per second = 3312851
Measuring 1
Total ticks: 3525 - Elapsed time: 1 ms
Measuring 2
Total ticks: 15802 - Elapsed time: 4 ms
Measuring 3
Total ticks: 5940 - Elapsed time: 1 ms
Measuring 4
Total ticks: 26862 - Elapsed time: 8 ms
Measuring 5
Total ticks: 4387 - Elapsed time: 1 ms

2.Run: with an enumerable range of 600000000:

Timer frequency in ticks per second = 3312851
Measuring 1
Total ticks: 29740243 - Elapsed time: 8977 ms
Measuring 2
Total ticks: 33722438 - Elapsed time: 10179 ms
Measuring 3
Total ticks: 77145502 - Elapsed time: 23286 ms
Measuring 4
Total ticks: 120078284 - Elapsed time: 36246 ms
Measuring 5
Total ticks: 30899585 - Elapsed time: 9327 ms

Interesting fact: using the Garbage Collector before performing the test script will increase the time for Measurement 4 vastly:

3.Run: with an enumerable range of 600000000 and Garbage Collector (from LinqPad):

Timer frequency in ticks per second = 3312851
Measuring 1
Total ticks: 29597830 - Elapsed time: 8934 ms
Measuring 2
Total ticks: 33532083 - Elapsed time: 10121 ms
Measuring 3
Total ticks: 76403692 - Elapsed time: 23062 ms
Measuring 4
Total ticks: 58534548 - Elapsed time: 17668 ms
Measuring 5
Total ticks: 32943622 - Elapsed time: 9944 ms

In conclusion, can I say that method 1 is the most suitable option to perform small select queries and method 5 when the select delegates will increase?

Modulo and `Math.Sqrt` are not good candidates for PLINQ since they are trivial(too fast). The more expensive the better. http://msdn.microsoft.com/en-us/library/dd997399.aspx You might also want to have a look at [my question here](http://stackoverflow.com/questions/7582591/how-to-plinq-an-existing-linq-query-with-joins), i've listed some recommendable readings. — Tim Schmelter, Aug 24 '12 at 07:26
Do you have some actual question? Or did you just want to start a discussion? If that's the case, then your “question” is not appropriate for SO. — svick, Aug 24 '12 at 09:49
No I don't want to start a discussion, just want to know which is the best method to do some huge work. And was not quite sure if the delegates were to cheap to make a qualified statement. According to msdn it was. According to the guys over here, it's not — Florian, Aug 24 '12 at 10:49

score 1 · Answer 1 · answered Aug 24 '12 at 10:36

You calculations are very cheap. They are not even a good candidate for non-parallel LINQ because the delegate calls might be more expensive than the calculations themselves. PLINQ has lots of additional overheads like starting tasks, synchronizing and copying data across threads. Try this:

bool Where(int i) {
 var sum = 0; 
 for (10000 times) {
  sum += i;
 }
 return i % 3 == 0;
}

And use that function in the where clause. This function is very expensive so the overhead imposed by threading and synchronization will no longer dominate the running time.

So basically, you are measuring the worst-case use-case for PLINQ. Try measuring an interesting one.

Performance Linq

1 Answers1