21

Following is an interview question:

The following one-liner generates and displays the list of first 500 prime numbers. How would you optimize it using parallel LINQ while still keeping it a SINGLE C# STATEMENT:

MessageBox.Show(string.Join(",", 
    Enumerable.Range(2, (int)(500 * (Math.Log(500) + Math.Log(System.Math.Log(500)) - 0.5)))
                .Where(x => Enumerable.Range(2, x - 2)
                                      .All(y => x % y != 0))
                .TakeWhile((n, index) => index < 500)));

I tried introducing AsParallel() as well as ParallelEnumerable into the query, but did not see any tangible benefits on multi-core machines. The query still uses one CPU core heavily while other cores enjoy leisure time. Can someone suggest an improvement that would distribute the load equally on all cores and thus reduce execution time?

For the enthusiast: The following formula returns an upper bound which is guaranteed to be greater than N prime numbers, i.e. if you check up to this number, you'll sure find N primes smaller than it:

UpperBound = N * (Log(N) + Log(Log(N)) - 0.5) //Log is natural log
dotNET
  • 33,414
  • 24
  • 162
  • 251
  • See http://stackoverflow.com/questions/1510124/program-to-find-prime-numbers – abatishchev Dec 11 '14 at 05:01
  • 1
    @abatishchev: The edit you made is invalid. The question explicitly asks to keep it a single statement. Also it introduces a compile-time error. I'll revert it. Plus the link you have shared is different for multiple reasons. For one, it doesn't use parallelism. Secondly, it shows primes between x and y whereas this question shows first N primes, which is totally different concept (since you don't have the upper bound in this case). – dotNET Dec 11 '14 at 05:35
  • @dotNET, unfortunately you still have an error in the code after abatishchev's poor edit. Missing the closing paren on the Range method. – Adam Smith Dec 11 '14 at 05:44
  • Also, thanks for the explanation on the upper bound formula, very cool! – Adam Smith Dec 11 '14 at 05:45
  • I just tried to make it readable. 236 char string is not something readable. My edit is still valid though. You've asked for 1-liner generator. The initial 1-liner generates and outputs. After my edit generator is still 1-liner. – abatishchev Dec 11 '14 at 18:15
  • I've shared that link just for your information. If I thought these are duplicates, I'd vote to close. But I didn't. – abatishchev Dec 11 '14 at 18:17
  • Please don't include an answer into the question. Post it separately, and accept it as appropriate. – abatishchev Dec 11 '14 at 18:19
  • Argh! I wouldn't write something so hard to debug in the first place! – Loren Pechtel Sep 15 '21 at 18:33

4 Answers4

18

This does it nicely on my machine. I've never actually seen all my cores go to 100% until now. Thanks for giving me an excuse to play :)

I increased the numbers until I had a time slow enough to measure (20,000).

The key option that made the difference to me was setting the ExecutionMode to ForceParallelism.

Because I use a NotBuffered merge option, I re-sort it when I'm done. This would not be necessary if you don't care about the order of the results (perhaps you're putting the results in a HashSet).

The DegreeOfParallelism and MergeOptions only provided minor gains (if any) to performance on my machine. This example shows how to use all the options in a single Linq statement, which was the original question.

var numbers = Enumerable.Range(2, (int)(20000 * (Math.Log(20000) + Math.Log(System.Math.Log(20000)) - 0.5)))
                .AsParallel()
                .WithDegreeOfParallelism(Environment.ProcessorCount) 
                .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
                .WithMergeOptions(ParallelMergeOptions.NotBuffered) // remove order dependancy
                .Where(x => Enumerable.Range(2, x - 2)
                                      .All(y => x % y != 0))
                .TakeWhile((n, index) => index < 20000);
string result = String.Join(",",numbers.OrderBy (n => n));
Mark Peters
  • 17,205
  • 2
  • 21
  • 17
  • Remarkable. This cuts time by nearly half. From 16.1 seconds to 8.5 seconds for 10,000 primes on my 2-cores machine. I'm making a slight edit by changing `8` to `Environment.ProcessorCount` to make it generic. Thanks a ton. I'll accept this one if we don't get anything better soon enough. – dotNET Dec 11 '14 at 06:12
  • After adding SQRT in the inner `Enumerable.Range()`, time was reduced from 8.5 seconds to 0.2 second. – dotNET Dec 11 '14 at 08:57
7

You can check only SQRT of value to do it (upgraded code from above)

var numbers = new[] {2, 3}.Union(Enumerable.Range(2, (int) (i*(Math.Log(i) + Math.Log(Math.Log(i)) - 0.5)))
                                           .AsParallel()
                                           .WithDegreeOfParallelism(Environment.ProcessorCount)
                                           // 8 cores on my machine
                                           .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
                                           .WithMergeOptions(ParallelMergeOptions.NotBuffered)
                                           // remove order dependancy
                                           .Where(x => Enumerable.Range(2, (int) Math.Ceiling(Math.Sqrt(x)))
                                                                 .All(y => x%y != 0))
                                           .TakeWhile((n, index) => index < i))
                          .ToList();

but it's crazy when you have a simple and extremly fast alhoritm:

private static IEnumerable<int> GetPrimes(int k)
{
    int n = (int)Math.Ceiling((k * (Math.Log(k) + Math.Log(Math.Log(k)) - 0.5)));
    bool[] prime = new bool[n + 1];
    prime[0] = prime[1] = false;
    for (int i = 2; i < prime.Length; i++)
    {
        prime[i] = true;
    }
    for (int i = 2; i*i <= n; ++i) // valid for n < 46340^2 = 2147395600
        if (prime[i])
        {
            for (int j = i*i; j <= n; j += i)
                prime[j] = false;
            yield return i;
        }
}

of course it isn't as good as LINQ because is not fashionable way to solve a problem, but you should know that it exists.

Alex Zhukovskiy
  • 9,565
  • 11
  • 75
  • 151
  • Thanks for the input. +1 for reminding me about the SQRT thing. About your algorithm, this is the famous sieve algo, right? I already know about it, but the question I was facing explicitly asked to do it in one statement, which can't be accomplished without LINQ, so it was not about being fashionable, it was just a requirement. Thanks anyway. :) – dotNET Dec 11 '14 at 08:40
  • As a precaution, please note that `Math.Ceiling()` together with `Math.Sqrt()` will cause 2 and 3 to be considered as composite, so one should add them manually to the output list. – dotNET Dec 11 '14 at 09:00
  • @dotNET Sure. See `Union` method from original code – Alex Zhukovskiy Dec 11 '14 at 10:52
2

For future readers, this is what I ended up with. It is pretty fast. On my humble machine, it generates the list of first 20,000 prime numbers under a second.

Enumerable.Range(5, (int)(N * (Math.Log(N) + Math.Log(System.Math.Log(N)) - 0.5)))
            .AsParallel()
            .WithDegreeOfParallelism(Environment.ProcessorCount)
            .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
            .WithMergeOptions(ParallelMergeOptions.NotBuffered) // remove order dependancy
            .Where(x => Enumerable.Range(2, (int)Math.Ceiling(Math.Sqrt(x)))
                                  .All(y => x % y != 0))
            .TakeWhile((n, index) => index < N).Concat(new int[] { 2, 3 }.AsParallel()).OrderBy(x => x).Take(N);
dotNET
  • 33,414
  • 24
  • 162
  • 251
  • You can further cut the time on query by changing `.TakeWhile((n, index) => index < N)` to `.TakeWhile((n, index) => index < N-2)` and then removing `.Take(N)`. – adityap Aug 24 '18 at 20:13
-5
Stopwatch t = new Stopwatch();
            t.Start();
            var numbers = Enumerable.Range(2, (int)(500 * (Math.Log(500) + Math.Log(System.Math.Log(500)) - 0.5)))
                .Where(x => Enumerable.Range(2, x - 2)
                                      .All(y => x % y != 0))
                .TakeWhile((n, index) => index < 500);
            t.Stop();
            MessageBox.Show(t.ElapsedMilliseconds.ToString());
            MessageBox.Show(string.Join(",", numbers));

It is evaluating in 3 millisecond. Good linq query.

Mahesh Malpani
  • 1,782
  • 16
  • 27