2

What is an efficient way of finding the number of ways k gifts can be chosen from N gifts where N can be very large (N ~ 10^18). That is we have to calculate N(C)K or N chose K. K can also be of the order of N.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Akhil
  • 2,269
  • 6
  • 32
  • 39

4 Answers4

5

I guess there are no fast ways to compute such large numbers. You can approximate it using Stirling's formula

Konstantin
  • 24,271
  • 5
  • 48
  • 65
  • Stirling's formula is useless if we don't know how k compares to n. – Alexandre C. Jan 23 '11 at 17:11
  • 3
    @Alexandre C.: Huh, how can you say that? Stirling's formula approximates `n!`. So you approximate `n!, k! and (n - k)!` and get an approximation to `n! / (k! * (n - k)!)`. If you want to get an asymptotic sense of the behavior of `C(n, k)` then, yes, you need a relationship between `k` and `n`, but to approximate, no. – jason Jan 23 '11 at 17:47
  • @Jason: Computing 3 times log gamma is very fast, probably comparable to pow + exp + sqrt. The power of Stirling's formula (and the Stirling series) is that it allows you to get accurate asymptotics when you know the behavior of k and n together. You have better choices for approximating factorials. A good start is http://en.wikipedia.org/wiki/Lanczos_approximation . – Alexandre C. Jan 24 '11 at 10:07
  • 1
    @Alexandre C.: I'm not denying that. I'm denying your claim that Stirling's formula is useless if we can't compare `k` to `n`). – jason Jan 24 '11 at 14:28
  • @Jason: my comment was written in a hurry. The point is that Stirling's formula is *computationally* (almost) useless, and would be useful in the case where we would need an approximation of C(n, k(n)) (like it is the case in statistical mechanics, for instance). – Alexandre C. Jan 24 '11 at 15:04
  • 1
    @Alexandre C.: See, that's where I disagree with you. I am not denying your claim that the approximation you give is better (this is well known, it's Straight Outta Numerical Recipes). I'm denying your claim that the Stirling approximation is useless. – jason Jan 24 '11 at 15:18
3

The value of C(n, k) can be close to 2^n. (well, order of magnitude smaller, but that's not important here).

What important is that to store number 2^(10^18), you need 10^18 bits or ~ 10^17 bytes.
You might want to adjust problem definition, because there're no computers like that.

Others have already pointed out approximate formula, where you could store result as a floating point number, thus not spending more memory than necessary.

Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181
3

Stirling's formula would be useful only if you had further asymptotic information such as k ~ n / 3 or k ~ log n. With no further information on your specific problem, you will not draw any information of Stirling's formula.

For your problem as stated, the most direct way to compute C(n, k) when k and n are large (and even when they are not large) is to use

log C(n, k) = log (n!) - (log (k!) + log ((n - k)!))

and

n! = gamma(n + 1).

The fact is that it is quite easy to come with an implementation of log gamma, and you then have

C(n, k) = exp (f(n + 1) - f(k + 1) - f(n - k + 1))

where f = log gamma .

You can find numerical algorithms for computing log gamma in Numerical Recipes, an old version is available there and you will find a sample implementation in Chapter 6.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • 1
    Also note that for many applications, you may even be happier with storing `ln(C(n, k)) = lngamma(n+1) - lngamma(k+1) - lngamma(n-k+1)`. The chance of overflow is considerably smaller. – Christopher Creutzig Jan 23 '11 at 17:20
  • can you explain what gamma is? I'm not familiar with it as a standard function? or is it just defined by gamma(n+1) = n!? – Chris Jan 24 '11 at 17:57
  • @Chris: Gamma(x) = integral(exp(-t) t^(x - 1) dt, t = 0..infinity). The formula Gamma(n) = (n-1)! holds by integrating by parts. There are fast methods to compute Gamma or log Gamma. – Alexandre C. Jan 24 '11 at 18:22
  • Ah, thanks. That rings some vague bells now. I figured it would probably be a continous function that worked at those discrete points but always nice to check. :) – Chris Jan 24 '11 at 18:24
3

Since variety is the spice of life, another approach is the following. The value (N Choose K)/2^N approaches a normal distribution with mean N/2 and Standard Deviation Sqrt[N]/2 and it does so quite quickly. We can therefore approximate (N Choose K) as 2^N * Normdist(x,0,1)/std where x =( k - N/2)/std and std is Sqrt[N]/2.
Normdist(x,0,1) = Exp(-x^2/2)*1/(Sqrt(2*Pi))

In terms of error this should get much better the larger the number, and a quick check using N as 113(?) shows an maximum error as a percent of the largest coefficient of less than 0.3%.

Not claiming its better than using Stirling formula, but think that it might avoid some of the n^n calculations and working out the log of these coefficients is a pretty simple calculation.

aronp
  • 799
  • 2
  • 6
  • 14
  • Thanks Alexandre, just to add this all works out to be Log(N Choose K) = (4*k - (4*Power(k,2))/n + Log(2) + n*(-1 + Log(4)) - Log(n*Pi))/2 which should be the fastest algorithm here (if not the best). – aronp Jan 24 '11 at 22:07