21

The question

In a LINQ query I can correctly (as in: the compiler won't complain) call .AsParallel() like this:

(from l in list.AsParallel() where <some_clause> select l).ToList();

or like this:

(from l in list where <some_clause> select l).AsParallel().ToList();

what exactly is the difference?

What I've tried

Judging from the official documentation I've almost always seen the first method used so I thought that was the way to go.
Today tho, I've tried to run some benchmark myself and the result was surprising. Here's the code I've run:

var list = new List<int>();
var rand = new Random();
for (int i = 0; i < 100000; i++)
    list.Add(rand.Next());

var treshold= 1497234;

var sw = new Stopwatch();

sw.Restart();
var result = (from l in list.AsParallel() where l > treshold select l).ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() before: {sw.ElapsedMilliseconds}");

sw.Restart();
result = (from l in list where l > treshold select l).AsParallel().ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() after: {sw.ElapsedMilliseconds}");

Output

call .AsParallel() before: 49
call .AsParallel() after: 4

So, apparently, despite what the documentation says, the second method is much faster. What's exactly happening here?

2 Answers2

23

The trick to using AsParallel in general is to decide if the savings from parallelism outweigh the overhead of doing things in parallel.

When conditions are easy to evaluate, such as yours, the overhead of making multiple parallel streams and collecting their results at the end greatly outweigh the benefit of performing comparisons in parallel.

When conditions are computationally intense, making AsParallel call early speeds things up quite a bit, because the overhead is now small in comparison to the benefit of running multiple Where computations in parallel.

For an example of a computationally hard condition, consider a method that decides whether a number is prime or not. Doing this in parallel on a multi-core CPU will show significant improvement over the non-parallelised implementation.

BJ Myers
  • 6,617
  • 6
  • 34
  • 50
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • thank you for your answer but I still don't quite understand the difference between the two invocation. Do you mean that when I call .AsParallel() in the second way (at the end of the query) I'm not actually parallelizing anything? –  Oct 23 '16 at 17:14
  • 3
    @Mahatma Yes, by then the work is already done in sequential mode. All LINQ needs is to collect results from parallel streams to a single list. – Sergey Kalinichenko Oct 23 '16 at 17:16
9

The second using of AsParallel is not necessary, it does not affect some_clause.

See also the test code below:

[TestMethod]
public void Test()
{
    var items = Enumerable.Range(0, 10);
    int sleepMs;
    for (int i = 0; i <= 4; i++)
    {
        sleepMs = i * 25;
        var elapsed1 = CalcDurationOfCalculation(() => items.AsParallel().Select(SomeClause).ToArray());
        var elapsed2 = CalcDurationOfCalculation(() => items.Select(SomeClause).AsParallel().ToArray());

        Trace.WriteLine($"{sleepMs}: T1={elapsed1} T2={elapsed2}");
    }

    long CalcDurationOfCalculation(Action calculation)
    {
        var watch = new Stopwatch();
        watch.Start();
        calculation();
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    int SomeClause(int value)
    {
        Thread.Sleep(sleepMs);
        return value * 2;
    }
}

and the output:

0: T1=77 T2=11
25: T1=103 T2=272
50: T1=202 T2=509
75: T1=303 T2=758
100: T1=419 T2=1010
Georg
  • 1,946
  • 26
  • 18