2

The number of combinations of k items which can be retrieved from N items is described by the following formula.

             N! 
c =  ___________________ 
       (k! * (N - k)!)

An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.

Optimize this formula to run with the smallest O time complexity

This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.

http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000

I'll post some info/links from that discussion after some people take a stab at the solution.

Any language is acceptable.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
Eoin Campbell
  • 43,500
  • 17
  • 101
  • 157

7 Answers7

6

Python: O(min[k,n-k]2)

def choose(n,k):
    k = min(k,n-k)
    p = q = 1
    for i in xrange(k):
        p *= n - i
        q *= 1 + i
    return p/q

Analysis:

  • The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
  • The cost of each multiplication will then also increase linearly.
  • This sum of all iterations becomes an arithmetic series over k.

My conclusion: O(k2)

If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)

def choose(n,k):
    k = min(k,n-k)
    result = 1.0
    for i in xrange(k):
        result *= 1.0 * (n - i) / (1 + i)
    return result
Jaco Van Niekerk
  • 4,180
  • 2
  • 21
  • 48
Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
  • Very pythonic, I was about to post more or less the same thing :P – Andrea Ambu Jun 07 '09 at 14:20
  • 1
    This will choke and die hilariously on 20000000 Choose 15000000. Also, it is only O(min[k,n-k]) if you indulge in the fiction that operations on huge integers can be performed in constant time. – PeterAllenWebb Jun 08 '09 at 14:14
  • 1
    My previous comment sounds dismissive to me on re-reading. Apologies if you took offense. But yes, the result for 20000000 Choose 15000000 has almost 5 million digits. The multiplicands in the later stages won't even fit into L2 cache on most processors, so things are bound to be very slow. What facilities does python have for extremely large floating point numbers? Does anyone know? – PeterAllenWebb Jun 09 '09 at 01:11
  • 1
    If both operands to the multiplication is larger than 1050 bits (70 * 15 bits/digit), then python will use Karatsuba's multiplication algorithm. It's time complexity is only O(n^1.584), compared to the "gradeshool" algorithm O(n^2). On the other operations, there are similar optimizations. – Markus Jarderot Jun 09 '09 at 21:37
5

Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.

Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.

If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:

  • 3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
  • 2 multiplications
  • 1 divisions

The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.

EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.

EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.

Community
  • 1
  • 1
PeterAllenWebb
  • 10,319
  • 3
  • 37
  • 44
2

I'd solve it in Mathematica:

Binomial[n, k]

Man, that was easy...

Pesto
  • 23,810
  • 2
  • 71
  • 76
2

Python: approximation in O(1) ?

Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).

from decimal import Decimal

ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()

pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')

# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z

def choose(n, k):
  n = Decimal(str(n))
  k = Decimal(str(k))
  return gamma(n+1)/gamma(k+1)/gamma(n-k+1)

Example:

>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')

Any higher, and it will overflow. The exponent seems to be limited to 40000000.

Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
1

Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.

It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • I assume you mean calculate factorials in a lookup and divide the nth entry by the kth and (n-k)th entry. Otherwise you use a two dimensional lookup table (and would still need to specify an algorithm to avoid the horrifying divisions required in the 1-dimensional approach). – David Berger Jun 05 '09 at 17:08
  • I hadn't given it much thought at all, beyond making use of a lookup table for some/all of the above. Obviously you have penalties in terms of memory and the initial calculation (would you need *all* permutations ?), hence I hedged myself to some small degree by saying for 'reasonable' numbers of values, and deliberately not defining that! – Brian Agnew Jun 05 '09 at 20:00
  • I think memoizing factorials would be better if you need to do it for much data. – Andrea Ambu Jun 07 '09 at 14:24
1

MATLAB:

  • The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)

    nchoosek(N,k)
    
  • My solution: 36 characters, O(min(k,N-k))

    a=min(k,N-k);
    prod(N-a+1:N)/prod(1:a)
    
gnovice
  • 125,304
  • 15
  • 256
  • 359
1

I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:

public int NChooseK(int n, int k)
{
    var result = 1;
    for (var i = 1; i <= k; i++)
    {
        result *= n - (k - i);
        result /= i;
    }

    return result;
}

The final code is so simple you won't believe it will work until you run it.

Also, the original article gives some nice explanation on how he reached the final algorithm.

Henrique Miranda
  • 1,030
  • 1
  • 13
  • 31