I need to calculate Stirling's approximation very fast

Question

I'm writing a small library for statistical sampling which needs to run as fast as possible. In profiling I discovered that around 40% of the time taken in the function is spent computing Stirling's approximation for the logarithm of the factorial. I'm focusing my optimization efforts on that piece of it. Here's my code (which uses MPFR):

const double AL[8] =
{ 0.0, 0.0, 0.6931471806, 1.791759469, 3.178053830, 4.787491743,
    6.579251212, 8.525161361 };
void HGD::mpfr_afc(mpfr_t &ret, const mpfr_t &val){

    if(mpfr_cmp_ui(val, 7) <= 0){
        mpfr_set_d(ret, AL[mpfr_get_ui(val, MPFR_RNDN)], MPFR_RNDN);
    }else{
        mpfr_set(ret, val, MPFR_RNDN);
        mpfr_add_d(ret, ret, 0.5, MPFR_RNDN);
        mpfr_log(LV, val, MPFR_RNDN);
        mpfr_mul(ret, LV, ret, MPFR_RNDN);
        mpfr_sub(ret, ret, val, MPFR_RNDN);
        mpfr_add_d(ret, ret, 0.399089934, MPFR_RNDN);
    }
}

I have a couple different ideas:

Precompute more than the first 8 inputs to the function.
Optimize the math (use a coarser approximation for smaller precision)
Use multiple threads to compute on different inputs in parallel
Switch to native arithmetic when numbers can fit in machine data types

Are there other approaches I could take?

I apologise in advance if this is naive, but if you break down your profiling further, which part(s) of this function take the longest ? — Russ Clarke, Feb 12 '14 at 03:10
I'm suspicious of your constants as for a `double` I'd expect 15 - 18 digits of precision. But then maybe the end result does not need a high precision answer. — chux - Reinstate Monica, Feb 12 '14 at 03:16
how often are you calling this function with the same value of `val`? — Glenn Teitelbaum, Feb 12 '14 at 03:16
Call mpfr_afc() less often? What size numbers is it working on? — brian beuning, Feb 12 '14 at 03:59
What is the range and distribution `val`? How often is `val` less than 8? — pat, Feb 12 '14 at 05:41
I had to look at [`scipy`](http://docs.scipy.org/doc/scipy/reference/)'s implementation of Stirling's formula to understand the problem: https://github.com/scipy/scipy/blob/6a4460f68315f0669604054be91ceeacd606f0b6/scipy/special/cephes/gamma.c#L293 — iljau, Feb 12 '14 at 08:11
`val` is less than 8 very infrequently overall except when the inputs to the sampling function are small themselves. — pg1989, Feb 12 '14 at 21:11

score 2 · Accepted Answer · answered Feb 12 '14 at 17:23

Switch to native arithmetic when numbers can fit in machine data types

That would be my first attempt. MPFR is likely to be a performance killer.

It seems to me you want to compute the logarithm of n! which you are already approximating with Stirling's formula.

Note that n!=Gamma(n+1). There are (seemingly) highly optimized functions to compute both the Gamma function and its logarithm. For example:

I would roll my own coarser approximation only if all the above fails performance-wise.

score 1 · Answer 2 · answered Feb 12 '14 at 17:14

A couple of thoughts here. First, it occurs to me that using MPFR for this may be overkill. Any of the multi precision libraries have enormous overhead. Not just a lot of overhead, but enormous overhead. Second thought is that maybe you don't need to use the multi precision log function. Maybe you could get away with the standard log?

If you can't fit your computations in a double precision float, then parallelizing using threads or other methods will certainly help. You could try playing with compiler optimizations, but I've not seen real improvements trying that.

Last option you could try is manually allocating memory space so that MPFR has a fixed overhead. I've never tried it, so I don't know if it would help.

I need to calculate Stirling's approximation very fast

2 Answers2