Any better way to implement factorials?

Question

I have been working on a library to handle numbers exceeding the normal bounds of 8 bytes for long long (usually).

I'm talking hundreds to thousands of digits.

Now I have implemented a function for factorials looking like this:

largeNum factorial(largeNum& input) {
    if (input > one) return (input * factorial(input-one));
    else return one;
}

Now this gives me good results. 100! took about 5 seconds to calculate and that's already above 150 digits. The result was correct.

Though 5 seconds is a long time 200 would already take minutes to calculate.

WolframAlpha, for example, can calculate 100000! in less than 10 seconds.

So there's gotta be a better way to do it. I've been looking on https://en.wikipedia.org/wiki/Factorial for the so-called Gamma function and was wondering if that would help in any way.

You might be interested in the algorithm that Python uses (written in C): https://github.com/python/cpython/blob/v3.6.1/Modules/mathmodule.c#L1275 . Not state-of-the-art, but still reasonably simple and significantly faster than a straightforward loop. — Mark Dickinson, Apr 18 '17 at 15:30
You are right, your implementation is definitely tad slow... A simple iterative version on my laptop (notebook) of Core i5 4210U in Python 3.4 takes ~9s for `100,000!`. Ofcause, granted that Python has a highly optimized arbitrary integer precision implementation — WhiZTiM, Apr 18 '17 at 15:31
@WhiZTiM: It's not highly optimized (in fact, it's barely optimized at all). For that, you need GMP (`gmpy2` if you're using Python). — Mark Dickinson, Apr 18 '17 at 15:34
@Folling: I suspect that it's actually your `largeNum` implementation that you need to optimise, rather than the factorial implementation. — Mark Dickinson, Apr 18 '17 at 15:38
The problem is almost certainly in your large num implementation. I implemented in Python the naïve recursive approach that you used and it takes just a fraction of a second to compute 500! — John Coleman, Apr 18 '17 at 15:41
Using tail recursion will improve the performance to par with iteration. — 2785528, Apr 18 '17 at 17:08
Consider using . 100,000! in the simplest loop finishes in less than 2 seconds. — 2785528, Apr 18 '17 at 17:15
see [Fast exact bigint factorial](http://stackoverflow.com/q/18317648/2521214) and [Fast bignum square computation](http://stackoverflow.com/q/18465326/2521214) — Spektre, Apr 18 '17 at 18:43

score 3 · Accepted Answer · answered Apr 18 '17 at 15:26

Although it is hard to optimize code without seeing the implementation, you can certainly pick up some cycles by converting your recursive function to an iterative one, or by helping compiler do it for you by optimizing tail call.

largeNum factorial(largeNum& input) {
    largeNum res = one;    
    while (input > one) {
        res *= input;
        input -= one;
    }
    return res;
}

Of course, this is just a different implementation of the same "middle school" approach to computing factorials. If you are looking for an advanced algorithms, here is a page dedicated to comparing various "hard" implementations.

duffymo · Answer 2 · 2017-04-18T17:33:14.760

2

I'll disagree with all these answers and say that the typical iterative and recursive factorial implementations are naive and expensive for large input values.

The better way to do it is to use a gamma function (or, better still, natural log of gamma function).

This works because gamma(n) = (n-1)! or n! = gamma(n+1)

If you combine this with memoization you've got an efficient solution that works well for large arguments.

The natural log of gamma is especially suited for evaluating combinations and permutations.

edited Apr 18 '17 at 17:33

answered Apr 18 '17 at 17:25

duffymo

305,152
44
369
561

Any reference implementation that successfully and efficiently use gamma function directly to compute factorials? As it stands now, your answer is stating a mathematical fact (which is already mentioned by OP) with some general suggestion (memoization), but there is no actual implementation. Your link to gamma function doesn't show any code to calculate factorial (apart from the math fact), and your last link also doesn't contain any reference to gamma function. In fact, I'm not surprised if in a code that computes gamma function, it backs off to factorial when the input is an integer. – justhalf Apr 19 '17 at 02:47
1

You are wrong again. There are gamma and lngamma implementations in "Numerical Recipes in C". Maybe you should take the suggestion and do some work: http://www.it.uom.gr/teaching/linearalgebra/NumericalRecipiesInC/c6-1.pdf – duffymo Apr 19 '17 at 09:10
Great! That's exactly what you need to include in your answer =) and perhaps copy parts of the PDF into your answer. Also don't forget to say things about the inexactness of floating-point arithmetic when n is large. Then this answer will be complete =D – justhalf Apr 19 '17 at 09:17
I don't care about your approval or being complete. Go look at what you were sent and write some code. – duffymo Apr 19 '17 at 09:18

XZ6H · Answer 3 · 2017-04-19T10:26:19.053

You can use threads to speed up the calculation either using unix pthread or C++ std::thread which is also cross platform. This will have performance gain only if the number is a large one or else it is not enough to offset the cost of thread creation.

Edit: This program uses four threads to calculate the factorial.

After running the program 8 times, the average threaded factorial time is 14 seconds and average non threaded factorial time is 18 seconds.
Sample program:

#include <iostream>
#include <thread>
#include <chrono>
#include "BigInt.h"

void fact(int upper, int lower, BigInt& val)
{
    for (auto i = upper; i >= lower; i--)
    {
        val = val*i;
    }
}

int main()
{
    std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

    int n = 1000;
    BigInt val1("1"), val2("1"), val3("1"), val4("1");

    std::thread thr1(&fact, n, (3*n)/4, std::ref(val1));
    std::thread thr2(&fact, ((3 * n) / 4) - 1, n/2, std::ref(val2));
    std::thread thr3(&fact, (n / 2)-1, n/4,  std::ref(val3));
    std::thread thr4(&fact, (n/4)-1, 1, std::ref(val4));

    thr1.join();
    thr2.join();
    thr3.join();
    thr4.join();
    auto ans = val1*val2*val3*val4;

    std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count();
    std::cout << "threaded factorial time: " << duration << "\n";

    t1 = std::chrono::high_resolution_clock::now();
    BigInt ans2("1");
    fact(n, 1, std::ref(ans2));
    t2 = std::chrono::high_resolution_clock::now();
    duration = std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count();
    std::cout << "non threaded factorial time: " << duration;

    return 0;
}

How exactly would you do that? There is nothing to parallelize. Each step in calculations depends on previous one. If you are talking about some other algorithm, which can be parallelized, explain it. — Revolver_Ocelot, Apr 18 '17 at 16:01
@Revolver_Ocelot No, Each step in calculations does not depend on previous ones. You can compute n to n/2 in one thread and n/2 to 1 in a separate thread and multiply the results of both threads. — XZ6H, Apr 19 '17 at 10:18
Naive recursive approach shown in OP post does depend on prevous calculations. You had to completely rewrite code to be able to parallelize. — Revolver_Ocelot, Apr 19 '17 at 18:30

Any better way to implement factorials?

3 Answers3