Parallel calculation of BigInteger factorial

Question

As a part of my BigDecimal library, I need to calculate the factorial of any given non negative integer. So I'm using .Net 4.0 's System.Numerics.BigInteger to be able to store huge numbers. Here is the function I'm using:

private BigInteger Factorial(BigInteger x)
{
     BigInteger res = x;
     x--;
     while (x > 1)
     {
          res *= x;
          x--;
     }
     return res;
}

It's working but not optimized. Now I want to use parallel computing, so here is what I've tried: (I have no experience with parallel programming)

public BigInteger Factorial(long x)
{
     BigInteger res = 1;
     ParallelLoopResult r = Parallel.For(2L, (x + 1), i =>
          res *= i
     );
     return res;
}

The weird problem is that the above function works perfectly for small numbers like 5! but doesn't work for big numbers like 1000! and each time returns a completely different result. So I realized it's not thread safe and the problem is with the variable res. I wonder what the correct implementation is?
And It would be nicer if I could use BigInteger instead of long for variable x.

Cristian Lupascu · Accepted Answer · 2013-09-20T08:55:13.827

You need to make sure your parallel processes do not share any state.

For example, in the case of the factorial, I would do the following:

set a degree of parallelism (DOP) - usually the number of processors on your machine for compute-bound operations
divide the problem into independent subsets
process each subset in parallel
aggregate the results obtained from the parallel processes

This is somehow a simplified Map-Reduce.

The problem consists in multiplying together a set numbers. One way to divide this set into subsets is to use N parallel for loops where each start at a value i (where 0 < i <= N) with a step of N (and N = DOP).

Here's the code that does that:

/// <summary>
/// The max number of parallel tasks
/// </summary>
static readonly int DegreeOfParallelism = Environment.ProcessorCount;

public BigInteger Factorial(long x)
{
    // Make as many parallel tasks as our DOP
    // And make them operate on separate subsets of data
    var parallelTasks =
        Enumerable.Range(1, DegreeOfParallelism)
                    .Select(i => Task.Factory.StartNew(() => Multiply(x, i),
                                 TaskCreationOptions.LongRunning))
                    .ToArray();

    // after all tasks are done...
    Task.WaitAll(parallelTasks);

    // ... take the partial results and multiply them together
    BigInteger finalResult = 1;

    foreach (var partialResult in parallelTasks.Select(t => t.Result))
    {
        finalResult *= partialResult;
    }

    return finalResult;
}

/// <summary>
/// Multiplies all the integers up to upperBound, with a step equal to DOP
/// starting from a different int
/// </summary>
/// <param name="upperBoud"></param>
/// <param name="startFrom"></param>
/// <returns></returns>
public BigInteger Multiply(long upperBound, int startFrom)
{
    BigInteger result = 1;

    for (var i = startFrom; i <= upperBound; i += DegreeOfParallelism)
        result *= i;

    return result;
}

On my machine this computes 100000! in about 30 seconds and the result is what Wolfram Alpha says it should be.

Update

After running a few more tests, I discovered something which I didn't expect: printing out the 100000! result to the console takes ~18 seconds (the result has 456574 digits).

The results of the 100000! computation alone (without printing the number) the are:

~10 seconds for parallel execution
~16 seconds for sequential execution

Clever trick: _i += DegreeOfParallelism_. You could use for DegreeOfParallelism: `Environment.ProcessorCount` — Jeroen van Langen, Sep 20 '13 at 08:11
@JeroenvanLangen great, I've edited my answer to include your suggestion — Cristian Lupascu, Sep 20 '13 at 08:32
Thanks, your approach is perfect but the code yields wrong results on my machine. The calculated result from your code for 100000! has 456569 digits, while the actual number has exactly 456574 digits. http://www.wolframalpha.com/input/?i=how+many+digits+in+%28100000!%29 Is this problem related to my machine? — SepehrM, Sep 20 '13 at 08:55
@SepehrM it was an off-by-one error in the last `for` loop (`i < upperBound` instead of `i <= upperBound`). I have fixed it. — Cristian Lupascu, Sep 20 '13 at 08:58
Thanks, It works well. Should DOP be the number of CPU cores or its threads? — SepehrM, Sep 20 '13 at 10:28
I get better results with DOP being the actual number of CPU cores. — SepehrM, Sep 20 '13 at 10:40
@SepehrM yes, for [CPU-bound](http://en.wikipedia.org/wiki/CPU_bound) operations, such as this one. For [IO-bound](http://en.wikipedia.org/wiki/I/O_bound) operations (ex: network, DB, HDD) you can have much more. — Cristian Lupascu, Sep 20 '13 at 10:55
@SepehrM in this case the CPU is the bottleneck and having higher DOP results in doing the actual processing + context switching between threads. — Cristian Lupascu, Sep 20 '13 at 10:56

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

Foreword

Based on some initial and very simple benchmarking, parallel version works faster for really big factorials (larger than ~1000!). For smaller ones, the overhead of parallel processing trumps everything else and sequential version is simply faster.

Code

That being said, here's what I got to work in LINQPad:

public static class Math
{
    // Sequential execution
    public static System.Numerics.BigInteger Factorial(System.Numerics.BigInteger x)
    {
        System.Numerics.BigInteger res = x;
        x--;
        while (x > 1)
        {
            res *= x;
            x--;
        }
        return res;
    }
    
    public static System.Numerics.BigInteger FactorialPar(System.Numerics.BigInteger x)
    {
        return NextBigInt().TakeWhile(i => i <= x).AsParallel().Aggregate((acc, item) => acc * item);
    }
    
    public static IEnumerable<System.Numerics.BigInteger> NextBigInt()
    {
        System.Numerics.BigInteger x = 0;
        while(true)
        {
            yield return (++x);
        }
    }
}

That works for both small (5! = 120, 6! = 720) and large (~8000!) factorials. As I mentioned though, there's a speed increase (2 - 3 times faster) for large factorials, but a serious performance penalty (up to two orders of magnitude) for small ones (results after warmup in LINQPad):

6! x 20 -> Serial avg ticks/std dev: 4.2/2.014, Paralell avg ticks/std dev: 102.6/39.599 (parallel execution is 25 times slower...)

300! x 20 -> Serial avg ticks/std dev: 104.35, Parallel avg ticks/std dev: 405.55/175.44 (parallel runs at 1/4th of sequential, speed wise)

1000! x 20-> Serial avg ticks/std dev: 2672.05/615.744, Parallel avg ticks/std dev: 3778.65/3197.308 (parallel runs at ~70 - 90% of sequential speed)

10000! x 20 -> Serial avg ticks/std dev: 286774.95/13666.607, Parallel avg ticks/std dev: 144932.25/16671.931 (parallel is 2x faster)

Take those with a grain of salt, you'd need to compile a release version and run it as a standalone to get 'real' results, but there's a trend there that is worth taking into consideration.

100000! (with printing and everything) took 26 seconds on my machine with parallel execution in LINQPad.

This seems like a faster approach but I didn't see any performance difference in my test of computing 1,000,000! as both answers took 6.5 minutes in my machine just to calculate and store it in a variable. — SepehrM, Sep 20 '13 at 10:25

score 2 · Answer 3 · edited Apr 04 '17 at 16:23

2

Try this for a simpler solution:

Func<int, BigInteger> factorialAgg = n => n < 2 ? BigInteger.One 
                      : Enumerable.Range(2, n-1)
                        .AsParallel()
                        .Aggregate(BigInteger.One, (r, i) => r * i);

var result = factorialAgg(100000);

edited Apr 04 '17 at 16:23

Kobor42

5,129
1
17
22

answered Nov 10 '16 at 13:22

GKA

1,230
1
15
15

And for an even shorter one: `Func Factorial = n => Enumerable.Range(1, n).AsParallel().Aggregate(BigInteger.One, (r, i) => r * i);` – Kobor42 Mar 05 '17 at 09:51
@Kobor42 the version you updated my answer to, does an extra multiplication by 1 which is unnecessary... It also goes into generating ranges for 0, 1 as well. I don't think it is as efficient. I did have it like that initially but everywhere you look the implementation for factorial checks for 0, 1 and returns immediately and you do the rest of the range for greater than 1... There is a reason.... – GKA Mar 26 '17 at 13:09
@GKA You are right. I didn't focus on that part. I was caring more about the extra typecast, and the possibility to add parallelism. Properly implemented parallel aggregate can really increase speed. Pity, that it doesn't on 4.5.1. I measured it. :-( Anyway applied your suggestion. – Kobor42 Apr 04 '17 at 16:22
@GKA The overload of `Aggregate` you use does not do work in parallel, because it is not known how to combine accumulated results. You should use overload without `seed`: `ParallelEnumerable.Range(2, n-1).Select(i => (BigInteger)i).Aggregate((a, b) => a*b)`. Or overload, which specifies how to combine accumulates: `ParallelEnumerable.Range(2, n-1).Aggregate(BigInteger.One, (a, b) => a*b, (c, d) => c*d, e => e)`. – user4003407 Apr 04 '17 at 17:21
@Kobor42 You edit this question, but as result of code reduction, you change `Aggregate` overload to one, which is not entitled for parallel invocation. – user4003407 Apr 05 '17 at 17:38
@PetSerAl Unfortunately what I see now is not my original answer but an edited one by Kobor42. My original answer uses the right function – GKA Apr 28 '17 at 06:35

Parallel calculation of BigInteger factorial

3 Answers3

Update

Foreword

Code

Linked