How would you write this algorithm for large combinations in the most compact way?

Question

The number of combinations of k items which can be retrieved from N items is described by the following formula.

             N! 
c =  ___________________ 
       (k! * (N - k)!)

An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.

Optimize this formula to run with the smallest O time complexity

This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.

http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000

I'll post some info/links from that discussion after some people take a stab at the solution.

Any language is acceptable.

Is this a code golf challenge with respect to the number of characters or the O time complexity? — gnovice, Jun 05 '09 at 15:59
I agree that this isn't really code golf. But it's interesting. — PeterAllenWebb, Jun 05 '09 at 16:39

score 6 · Answer 1 · edited Jul 17 '14 at 13:26

6

Python: O(min[k,n-k]²)

def choose(n,k):
    k = min(k,n-k)
    p = q = 1
    for i in xrange(k):
        p *= n - i
        q *= 1 + i
    return p/q

Analysis:

The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
The cost of each multiplication will then also increase linearly.
This sum of all iterations becomes an arithmetic series over k.

My conclusion: O(k²)

If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×10^4884378.)

def choose(n,k):
    k = min(k,n-k)
    result = 1.0
    for i in xrange(k):
        result *= 1.0 * (n - i) / (1 + i)
    return result

edited Jul 17 '14 at 13:26

Jaco Van Niekerk

4,180
2
21
48

answered Jun 05 '09 at 15:46

Markus Jarderot

86,735
21
136
138

Very pythonic, I was about to post more or less the same thing :P – Andrea Ambu Jun 07 '09 at 14:20
1

This will choke and die hilariously on 20000000 Choose 15000000. Also, it is only O(min[k,n-k]) if you indulge in the fiction that operations on huge integers can be performed in constant time. – PeterAllenWebb Jun 08 '09 at 14:14
1

My previous comment sounds dismissive to me on re-reading. Apologies if you took offense. But yes, the result for 20000000 Choose 15000000 has almost 5 million digits. The multiplicands in the later stages won't even fit into L2 cache on most processors, so things are bound to be very slow. What facilities does python have for extremely large floating point numbers? Does anyone know? – PeterAllenWebb Jun 09 '09 at 01:11
1

If both operands to the multiplication is larger than 1050 bits (70 * 15 bits/digit), then python will use Karatsuba's multiplication algorithm. It's time complexity is only O(n^1.584), compared to the "gradeshool" algorithm O(n^2). On the other operations, there are similar optimizations. – Markus Jarderot Jun 09 '09 at 21:37

score 5 · Accepted Answer · edited May 23 '17 at 12:30

Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.

Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.

If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:

3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
2 multiplications
1 divisions

The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.

EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.

EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.

score 2 · Answer 3 · answered Jun 05 '09 at 15:45

2

I'd solve it in Mathematica:

Binomial[n, k]

Man, that was easy...

answered Jun 05 '09 at 15:45

Pesto

23,810
2
71
76

And how does Mathematica calculate factorials? – Matt K Jun 05 '09 at 15:46
@Matt Kane: Probably exactly the same way WolframAlpha does. It's a Wolfram product, too. – Pesto Jun 05 '09 at 15:48
@Rich B: I don't think WinRAR has that kind of functionality. – Sean Jun 05 '09 at 15:49
Well, what time complexity is it? That's what the question is looking for. – Matt K Jun 05 '09 at 15:51
@Matt Kane: It's one line of code, so it's obviously O(1). Duh. – Pesto Jun 05 '09 at 15:53
4

You are not being serious enough. This is the Internet. There is no joking on the Internet. – Matt K Jun 05 '09 at 15:53
I understand what code golf is, but in the problem is this bit : "Optimize this formula to run with the smallest O time complexity" If you don't know the complexity you haven't really made an attempt. – Matt K Jun 05 '09 at 16:30

score 2 · Answer 4 · answered Jun 09 '09 at 22:59

Python: approximation in O(1) ?

Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).

from decimal import Decimal

ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()

pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')

# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z

def choose(n, k):
  n = Decimal(str(n))
  k = Decimal(str(k))
  return gamma(n+1)/gamma(k+1)/gamma(n-k+1)

Example:

>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')

Any higher, and it will overflow. The exponent seems to be limited to 40000000.

score 1 · Answer 5 · answered Jun 05 '09 at 15:53

1

Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.

It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.

answered Jun 05 '09 at 15:53

Brian Agnew

268,207
37
334
440

I assume you mean calculate factorials in a lookup and divide the nth entry by the kth and (n-k)th entry. Otherwise you use a two dimensional lookup table (and would still need to specify an algorithm to avoid the horrifying divisions required in the 1-dimensional approach). – David Berger Jun 05 '09 at 17:08
I hadn't given it much thought at all, beyond making use of a lookup table for some/all of the above. Obviously you have penalties in terms of memory and the initial calculation (would you need *all* permutations ?), hence I hedged myself to some small degree by saying for 'reasonable' numbers of values, and deliberately not defining that! – Brian Agnew Jun 05 '09 at 20:00
I think memoizing factorials would be better if you need to do it for much data. – Andrea Ambu Jun 07 '09 at 14:24

score 1 · Answer 6 · answered Jun 05 '09 at 16:22

1

MATLAB:

The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)
```
nchoosek(N,k)
```
My solution: 36 characters, O(min(k,N-k))
```
a=min(k,N-k);
prod(N-a+1:N)/prod(1:a)
```

answered Jun 05 '09 at 16:22

gnovice

125,304
15
256
359

Henrique Miranda · Answer 7 · 2012-03-20T19:04:52.790

I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:

public int NChooseK(int n, int k)
{
    var result = 1;
    for (var i = 1; i <= k; i++)
    {
        result *= n - (k - i);
        result /= i;
    }

    return result;
}

The final code is so simple you won't believe it will work until you run it.

Also, the original article gives some nice explanation on how he reached the final algorithm.

How would you write this algorithm for large combinations in the most compact way?

7 Answers7