0

I have to calculate in c binomial coefficients of the expression (x+y)**n, with n very large (order of 500-1000). The first algo to calculate binomial coefficients that came to my mind was multiplicative formula. So I coded it into my program as

long double binomial(int k, int m)
{
    int i,j;
    long double num=1, den=1;
    j=m<(k-m)?m:(k-m);
    for(i=1;i<=j;i++)
    {
        num*=(k+1-i);
        den*=i;
    }
    return num/den; 
}

This code is really fast on a single core thread, compared for example to recursive formula, although the latter one is less subject to rounding errors since involves only sums and not divisions. So I wanted to test these algos for great values and tried to evaluate 500 choose 250 (order 10^160). I have found that the "relative error" is less than 10^(-19), so basically they are the same number, although they differ something like 10^141.

So I'm wondering: Is there a way to evaluate the order of the error of the calculation? And is there some fast way to calculate binomial coefficients which is more precise than the multiplicative formula? Since I don't know the precision of my algo I don't know where to truncate the stirling's series to get better results..

I've googled for some tables of binomial coefficients so I could copy from those, but the best one I've found stops at n=100...

Raktim Biswas
  • 4,011
  • 5
  • 27
  • 32
  • You'd be better off with a big numbers library and integer multiplication/division. – Jean-François Fabre Sep 30 '16 at 20:06
  • Possible duplicate of [Number of combinations (N choose R) in C++](http://stackoverflow.com/questions/9330915/number-of-combinations-n-choose-r-in-c) – Jean-François Fabre Sep 30 '16 at 20:07
  • I don't think it's a duplicate since I know the formula is correct, but the calculations are rounded and so there are errors – Francesco Di Lauro Sep 30 '16 at 20:13
  • @FrancescoDiLauro How exact do you actually need it to be? – deamentiaemundi Sep 30 '16 at 20:32
  • The problem is that I could go well with that "precision", but I don't know what's the error's order as a function of n. – Francesco Di Lauro Sep 30 '16 at 20:54
  • By the way, multiplication is not less precise than addition. All else being equal, you might expect the multiplicative solution to have a smaller error because it involves fewer operations. That seems to be confirmed by my testing fwiw. – rici Oct 01 '16 at 04:31
  • I thought that the problem might be the rounding errors when you divide 2 very large integers. When you simply add you may also expect errors only on the last decimal considerated, right? – Francesco Di Lauro Oct 01 '16 at 07:19

4 Answers4

1

To get exact integer results for small k and m, a better solution might be (a slight variation of your code) :

  unsigned long binomial(int k, int m)
  {
   int i,j; unsigned long num=1;
   j=m<(k-m)?m:(k-m);
   for(i=1;i<=j;i++)
   {
    num*=(k+1-i);
    num/=i;
   }
   return num;
  }

Every time you get a combinatorial number after doing the division num/=i, so you won't get truncated. To get approximate results for bigger k and m, your solution might be good. But beware that long double multiplication is already much slower than the multiplication and division of integers (unsigned long or size_t). If you want to get bigger numbers exact, probably a big integer class must be coded or included from a library. You can also google if there's fast factorial algorithm for n! of extremely big integer n. That may help with combinatorics, too. Stirling's formula is a good approximation for ln(n!) when n is large. It all depends on how accurate you want to be.

Zhuoran He
  • 873
  • 9
  • 15
  • Is it better to divide inside the for cycle? I thought that it was less precise than just making one division, but maybe for large numbers is better to divide step by step – Francesco Di Lauro Sep 30 '16 at 20:51
  • `unsigned long` is the 64-bit integer type, so there won't be any error. It's only a matter of avoiding overflow as much as we can. – Zhuoran He Sep 30 '16 at 20:53
  • If you want to use `long double`, then dividing outside the loop is better. It's mainly because division is slower than multiplication. Accuracy should be the same. – Zhuoran He Sep 30 '16 at 20:54
  • Ok but if I have to calculate 250 choose 125 I can't with an unsigned long function, maybe the only solution is adding a library to deal with large integers – Francesco Di Lauro Sep 30 '16 at 21:06
  • Yeah. If you eventually decide to use a big integer `class` to get exact result, it's better to divide inside the loop. Because for big integers, the speed of every operation depends on how big the number is. So we should try our best to keep it small. – Zhuoran He Sep 30 '16 at 21:06
1

If you really want to use the multiplicative formula, I would recommend an exception based approach.

  1. Implement the formula with large integers (long long for example)
  2. Attempt division operations as soon as possible (as suggested by Zhuoran)
  3. Add code to check correctness of every division and multiplication
  4. Resolve incorrect divisions or multiplications, e.g.
    • try the division in loop proposed by Zhuoran, but if it fails resort back to the initial algorithm (accumulating the product of divisor in den)
    • store the unresolved multiplier, divisors in additional long integers and try to resolve them in next iteration loops
  5. If you really use large numbers then your result might not fit in long integer. then in that case you can switch to long double or use your personal LongInteger storage.

This is a skeleton code, to give you an idea:

long long binomial_l(int k, int m)
{
    int i,j;
    long long num=1, den=1;
    j=m<(k-m)?m:(k-m);
    for(i=1;i<=j;i++)
    {
        int multiplier=(k+1-i);
        int divisor=i;
        long long candidate_num=num*multiplier;
        //check multiplication
        if((candidate_num/multiplier)!=num)
        {
            //resolve exception...
        }
        else
        {
            num=candidate_num;
        }

        candidate_num=num/divisor;
        //check division
        if((candidate_num*divisor)==num)
        {
            num=candidate_num;
        }
        else
        {
            //resolve exception
            den*=divisor;
            //this multiplication should also be checked...
        }
    }
    long long candidate_result= num/den; 
    if((candidate_result*den)==num)
    {
        return candidate_result;
    }
    // you should not get here if all exceptions are resolved
    return 0;
}
SorMun
  • 105
  • 4
  • I don't understand how do I resolve incorrect divisions. I mean, if I make the same calculation every cycle, I would expect the same wrong result? Could you elaborate a bit? Thanks – Francesco Di Lauro Oct 01 '16 at 01:15
  • Well, The idea is to to implement the exception as a delayed algorithm. For example, in my code snapshot, I am attempting make the division at the earliest possible moment ("candidate_num=num/divisor" in my code). If the division is correct, I am taking the candidate, else I am "storing" the divisor in den, to delay the division when possible, in this case at "long long candidate_result=num/den". – SorMun Oct 01 '16 at 01:29
1

If you're just computing individual binomial coefficients C(n,k) with n fairly large but no larger than about 1750, then your best bet with a decent C library is to use the tgammal standard library function:

tgammal(n+1) / (tgammal(n-k+1) * tgammal(k+1))

Tested with the Gnu implementation of libm, that consistently produced results within a few ULP of the precise value, and generally better than solutions based on multiplying and dividing.

If k is small (or large) enough that the binomial coefficient does not overflow 64 bits of precision, then you can get a precise result by alternately multiplying and dividing.

If n is so large that tgammal(n+1) exceeds the range of a long double (more than 1754) but not so large that the numerator overflows, then a multiplicative solution is the best you can get without a bignum library. However, you could also use

expl(lgammal(n+1) - lgammal(n-k+1) - lgammal(k+1))

which is less precise but easier to code. (Also, if the logarithm of the coefficient is useful to you, the above formula will work over quite a large range of n and k. Not having to use expl will improve the accuracy.)

If you need a range of binomial coefficients with the same value of n, then your best bet is iterative addition:

void binoms(unsigned n, long double* res) {
  // res must have (n+3)/2 elements
  res[0] = 1;
  for (unsigned i = 2, half = 0; i <= n; ++i) {
    res[half + 1] = res[half] * 2;
    for (int k = half; k > 0; --k)
      res[k] += res[k-1];
    if (i % 2 == 0)
      ++half;
  }
}

The above produces only the coefficients with k from 0 to n/2. It has a slightly larger round-off error than the multiplicative algorithm (at least when k is getting close to n/2), but it's a lot quicker if you need all the coefficients and it has a larger range of acceptable inputs.

rici
  • 234,347
  • 28
  • 237
  • 341
0

This may not be what OP is looking for, but one can analytically approximate nCr for large n with binary entropy function. It is mentioned in

Community
  • 1
  • 1
Patrick the Cat
  • 2,138
  • 1
  • 16
  • 33
  • The binary entropy function is derived from Stirling's formula with no O(1/n) asymptotic expansion correction. If what the OP is looking for is just the probability function of a binomial distribution, the best solution would be to do it exactly for small n and use Gaussian approximation for `n>100, p=0.05--0.95`, and Poisson approximation for `n>100, p<0.05 or p>0.95`, I think. – Zhuoran He Sep 30 '16 at 20:58
  • Yeah one solution might be dividing and approximate when n choose k is "big". The problem is again how much would I truncate in such cases, and, since the execution time of my implementation of multiplication formula for now it's not a problem, if it's worth – Francesco Di Lauro Sep 30 '16 at 21:11