44

What is the most efficient method to evaluate the value of "n choose k" ? The brute force way I think would be to find n! / k! / (n-k)! by calculating each factorial separately.

A better strategy may be to use DP according to this recursive formula, nCk == (n-1)C(k-1) + (n-1)C(k). Is there any other better method to evaluate n choose k in terms of complexity and avoiding risk of overflow?

Nikunj Banka
  • 11,117
  • 16
  • 74
  • 112
  • Factorial computation is much more efficient than your recursive alternative in both space and time terms – SomeWittyUsername Mar 08 '13 at 19:55
  • 2
    Well, for starters you can replace `n!/k!` with `n*(n-1)*(n-2)*...*(k+1)` No point in calculating `n!` and `k!` in full when many of the factors cancel out. – Tim Goodman Mar 08 '13 at 19:56
  • What range of n are you considering? – Andrew Morton Mar 08 '13 at 20:32
  • @AndrewMorton I have to calculate n choose k where n is <1000000 and k is < 1000000. – Nikunj Banka Mar 09 '13 at 03:03
  • possible duplicate of [Algorithm to return all combinations of k elements from n](http://stackoverflow.com/questions/127704/algorithm-to-return-all-combinations-of-k-elements-from-n) – Toto Mar 10 '13 at 09:38
  • 6
    @M42: this question is not a duplicate of the one you link to. That question asks for all combinations of k elements from n, whereas this question only wants the *number* of such combinations. – Luke Woodward Mar 10 '13 at 12:32

13 Answers13

56

Here is my version, which works purely in integers (the division by k always produces an integer quotient) and is fast at O(k):

function choose(n, k)
    if k == 0 return 1
    return (n * choose(n - 1, k - 1)) / k

I wrote it recursively because it's so simple and pretty, but you could transform it to an iterative solution if you like.

Beginner
  • 5,277
  • 6
  • 34
  • 71
user448810
  • 17,381
  • 4
  • 34
  • 59
  • 1
    It's not O(*k*); *k* is strictly less than *n*, so you can't ignore the contribution of *n* to the run-time. At best, you can say it is O(*k* M(*n*)), where M(*n*) is the speed of your multiplication algorithm. – chepner Mar 08 '13 at 20:13
  • 6
    Correct, but pedantic. The function stated above makes O(k) multiplications and divisions. I ignored the bit-complexity of the operations themselves. – user448810 Mar 08 '13 at 20:19
  • This function calculates `n!/k!`. That's not what the question is about – SomeWittyUsername Mar 08 '13 at 20:28
  • 6
    @icepack: No, it doesn't. The numerator ranges from n to n-k+1. The denominator ranges from k to 1. Thus, choose(9,4) = (9*8*7*6) / (4*3*2*1) = 126, which is correct. By contrast, 9!/4! = 362880 / 24 = 15120. – user448810 Mar 08 '13 at 20:40
  • 1
    This is the multiplicative method in a recursion form. It is indeed O(k) and it is the fastest it can get unless an estimation of k! using Stirling's approximation is good enough (http://en.wikipedia.org/wiki/Stirling%27s_approximation). There is a divide and conquer version of factorial that *might* help too (http://gmplib.org/manual/Factorial-Algorithm.html) – Pedrom Mar 08 '13 at 21:57
  • I think this gets overflown very easily in C++, I get all kinds of strange results and/or errors... I try to calculate the sum of n choose n-k of some range: http://pastebin.com/jhNAp8KD – shinzou Jan 04 '15 at 11:25
  • @user448810 the distinction is not pedantic. There is an entire field of research dedicated to improving calculations of such numbers. See the paper "On the Complexity of Calculating Factorials". https://www.sciencedirect.com/science/article/abs/pii/0196677485900069. If that doesn't persuade you, I recommend you plot the run time of your method as a function of k. You will see it does not grow linearly. – Quantum Guy 123 Jun 30 '22 at 19:19
41

You could use the Multiplicative formula for this:

enter image description here

http://en.wikipedia.org/wiki/Binomial_coefficient#Multiplicative_formula

Pedrom
  • 3,823
  • 23
  • 26
  • 19
    we can speed it up by calculating (n choose n-k) instead of (n choose k) when n-k < k. – Nikunj Banka Mar 09 '13 at 14:06
  • 3
    Just take into account that `(n - (k-i)) / i`may not be an integer – pomber Jun 12 '16 at 01:21
  • 1
    @pomber the individual `factor (n - (k-i)) / i` may in fact not be an integer BUT the product of the factors from `x=1 upto y` will always be divisible by `y` (because there are exactly y consecutive integers) – Lambder Dec 23 '16 at 12:11
  • 2
    Slightly updated formula: https://wikimedia.org/api/rest_v1/media/math/render/svg/652661edd20c8121e58a2b26844ce46c24180f0f – trig-ger Jan 16 '17 at 11:05
  • these factorials are quite costly to compute. You'd get a speed up by using the method seen here: https://cs.stackexchange.com/a/14476 – Quantum Guy 123 Jun 28 '22 at 23:12
9

Probably the easiest way to compute binomial coefficients (n choose k) without overflowing is to use Pascal's triangle. No fractions or multiplications are necessary. (n choose k). The nth row and kth entry of Pascal's triangle gives the value.

Take a look at this page. This is an O(n^2) operation with only addition, which you can solve with dynamic programming. It's going to be lightning fast for any number that can fit in a 64-bit integer.

Andrew Mao
  • 35,740
  • 23
  • 143
  • 224
6

If you're going to calculate many combinations like this, calculating the Pascal's Triangle is sure the best option. As you already know the recursive formula, I think I can past some code here:

MAX_N = 100
MAX_K = 100

C = [[1] + [0]*MAX_K for i in range(MAX_N+1)]

for i in range(1, MAX_N+1):
    for j in range(1, MAX_K+1):
        C[i][j] = C[i-1][j-1] + C[i-1][j];

print C[10][2]
print C[10][8]
print C[10][3]
fabiomaia
  • 592
  • 9
  • 19
Juan Lopes
  • 10,143
  • 2
  • 25
  • 44
3

After hitting a similar problem, I decided to compile the best solutions if have seen and run a simple test on each for a couple different values of n and k. I started with 10 or so functions and weeded out the ones that were just plain wrong or stopped working at specific values. Of all of the solutions, the answer above by user448810 is the cleanest and easiest to implement, I quite like it. Below will be code including each test i ran, the number of times i ean each function for each test, each functions code, the functions output and the time it took to get that output. I only did 20000 runs, there were still fluctuations in the time if i re ran the tests, but you should get a general sense of how well each worked.

//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //EXPECTED VALUES
    //x choose x = 1
    //9 choose 4 =126
    //52 choose 5 = 2598960;
    //64 choose 33 = 1.777090076065542336E18;

    //# of runs for each test: 20000
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//https://stackoverflow.com/a/12983878/4285191
public static double combination(long n, long k) {
    double sum=0;
    for(long i=0;i<k;i++) {
        sum+=Math.Log10(n-i);
        sum-=Math.Log10(i+1);
    }
    return Math.Pow(10, sum);
}
/*
    10 choose 10
    0.9999999999999992
    Elapsed=00:00:00.0000015
    9 choose 4
    126.00000000000001
    Elapsed=00:00:00.0000009
    52 choose 5
    2598959.9999999944
    Elapsed=00:00:00.0000013
    64 choose 33
    1.7770900760655124E+18
    Elapsed=00:00:00.0000058
*/


//........................................................
//https://stackoverflow.com/a/19125294/4285191
public static double BinomCoefficient(long n, long k) {
    if (k > n) return 0;
    if (n == k) return 1; // only one way to chose when n == k
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double c = 1;
    for (long i = 1; i <= k; i++) {
        c *= n--;
        c /= i;
    }
    return c;
}
/*
    10 choose 10
    1
    Elapsed=00:00:00
    9 choose 4
    126
    Elapsed=00:00:00.0000001
    52 choose 5
    2598960
    Elapsed=00:00:00.0000001
    64 choose 33
    1.7770900760655432E+18
    Elapsed=00:00:00.0000006
*/

//........................................................
//https://stackoverflow.com/a/15302448/4285191
public static double choose(long n, long k) {
    if (k == 0) return 1;
    return (n * choose(n-1, k-1)) / k;
}
/*
    10 choose 10
    1
    Elapsed=00:00:00.0000002
    9 choose 4
    126
    Elapsed=00:00:00.0000003
    52 choose 5
    2598960
    Elapsed=00:00:00.0000004
    64 choose 33
    1.777090076065543E+18
    Elapsed=00:00:00.0000008
*/

//........................................................
//My own version which is just a mix of the two best above.
public static double binomialCoeff(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double recusion(long n, long k) {
        if (k == 0) return 1; // only one way to chose when n == k
        return (n * recusion(n-1, k-1)) / k;
    }
    return recusion(n,k);
}
/*
    10 choose 10
    1
    Elapsed=00:00:00
    9 choose 4
    126
    Elapsed=00:00:00.0000001
    52 choose 5
    2598960
    Elapsed=00:00:00.0000002
    64 choose 33
    1.777090076065543E+18
    Elapsed=00:00:00.0000007
*/

//........................................................
//https://en.wikipedia.org/wiki/Binomial_coefficient
public static double binomial(long n, long k) {
    // take advantage of symmetry
    if (k > n-k)  k = n-k;
    
    long c = 1;
    for (long i = 1; i <= k; i++, n--) {
        // return 0 on potential overflow
        if (c/i >= long.MaxValue/n) return 0;
        // split c * n / i    into (c / i * i) + (c % i * n / i)
        c = (c / i * n) + (c % i * n / i); 
    }
    
    return c;
}

/*
    10 choose 10
    1
    Elapsed=00:00:00.0000006
    9 choose 4
    126
    Elapsed=00:00:00.0000002
    52 choose 5
    2598960
    Elapsed=00:00:00.0000003
    64 choose 33
    1.7770900760655424E+18
    Elapsed=00:00:00.0000029
*/
ZXYNINE
  • 712
  • 4
  • 5
2

If you have a lookup table of factorials then the calculation of C(n,k) will be very fast.

Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
  • For big number of n and k that lookup table might be prohibitive. Also there should be an option for values outside that table. – Pedrom Mar 08 '13 at 19:58
  • @Pedrom There was no mention of limitations on the magnitude of numbers in the question. It's tagged `language-agnostic` and `algorithms`. – Andrew Morton Mar 08 '13 at 20:05
2

The problem with the n!/k!(n-k)! approach is not so much the cost as the issue with ! growing very rapidly so that, even for values of nCk which are well within the scope of, say, 64-bit integers, intermediate calculations are not. If you don't like kainaw's recursive addition approach you could try the multiplicative approach:

nCk == product(i=1..k) (n-(k-i))/i

where product(i=1..k) means the product of all the terms when i takes the values 1,2,...,k.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • You're right about the possibility of overflow corrupting things well before the final answer stops fitting in a machine word, but I don't like your solution: it will produce fractional values for some factors, e.g. for the i=2 factor when n=4 and k=3. (Of course the factors will multiply together in the end to give an integer, but your way means intermediate results need to be stored in floating point -- yuck!) – j_random_hacker Aug 11 '13 at 00:24
2

The fastest way is probably to use the formula, and not pascals triangle. Let's start not to do multiplications when we know that we're going to divide by the same number later. If k < n/2, let's have k = n - k. We know that C(n,k) = C(n,n-k) Now :

n! / (k! x (n-k)!) = (product of numbers between (k+1) and n) / (n-k)!

At least with this technique, you're never dividing by a number that you used to multiply before. You have (n-k) multiplications, and (n-k) divisions.

I'm thinking about a way to avoid all divisions, by finding GCDs between the numbers that we have to multiply, and those we have to divide. I'll try to edit later.

bruce_ricard
  • 743
  • 1
  • 7
  • 15
  • Finding the GCD will surely reduce the amount of operations. Unfortunately, the GCD finding for itself would be a much heavier task. – SomeWittyUsername Mar 08 '13 at 20:24
  • Yes I'm afraid of that. But the GCDs would be computed on small numbers, when the multiplication has a big one. And actually I'm not sure that a GCD is harder than a division. – bruce_ricard Mar 08 '13 at 20:30
  • I tend to be skeptical, but it would be interesting to see the results :) – SomeWittyUsername Mar 08 '13 at 20:34
  • Division and GCD both have a O(n^2) complexity for 2 numbers of size n. Here we would calculate division on a big and a small number, whereas the GCD would be on 2 small numbers, but we would need to do it for all the numbers. If I had to do it by hand, I think I'd try to find at least the obvious multiples and GCDs, to avoid doing useless divisions. – bruce_ricard Mar 08 '13 at 20:36
  • If you want the prime factorization of C(n,k), you can use Krummer's Theorem. (You need to know all the primes less than or equal to n.) http://planetmath.org/encyclopedia/KummersTheorem.html This doesn't quite avoid divisions, since you need to be able to express k and n-k base p for each prime p. – rici Mar 08 '13 at 20:36
  • @rici : I don't care about the prime decomposition of C(n,k), I even don't care about any prime factor, I just want to avoid divisions by computing GCDs. – bruce_ricard Mar 08 '13 at 20:38
  • @double_squeeze: if you have the prime decomposition of C(n,k), you can compute C(n,k) with only multiplications. Isn't that your goal? – rici Mar 08 '13 at 20:39
2

I haven't seen this answer before in case it helps someone else

def choose(n, k):
   nominator = n
   for i in range(1,k):
      nominator *= (n-i)
      k *= i 
   return nominator/k
jenn
  • 19
  • 2
1

Using recursive functions to solve this problem is not optimal in terms of space complexity. This is because recursive calls will build a call stack. I think it is possible to achieve the computation of n Choose k with k <= n in a linear time complexity and constant space complexity.

For 0 < k <= n, the maximum of n, k and n-k is n, therefore the idea is to only compute n! and to infer in the same loop, the values for k! and (n-k)!. Thus the final time complexity is O(n).

Such a function could look like this:

public static long combinationsCount(int n, int k) {
    //this will hold the result for n!
    long a = 1;
    //this will hold the result for k!
    long b = 1;
    //this will hold the result for (n-k)!
    long c = 1;        
    for (int i = 1; i <= n; i++) {
        a *= i;
        //if the current value of i is k, then a is equal to k!
        if (i == k) {
            b = a;
        }
        //if the current value of i is n-k, then a is equal to (n-k)!
        if (i == n-k) {
            c = a;
        }
     }
     //n choose k formula
     return a/(b*c);
}
  • Thanks. This is just a suggested answer to the initial question. – Rabih Sarieddine May 27 '22 at 07:51
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Sumit Sharma May 28 '22 at 17:00
1

Using Pascal's triangle is a fast method for calculating n choose k. You can refer to the answer here for more info.

The fastest method I know of would be to make use of the results from "On the Complexity of Calculating Factorials". Just calculate all 3 factorials, then perform the two division operations, each with complexity M(n logn).

1

Here is a very simple C++ implementation that minimizes the risk of overflow. You can also replace each long long with double to allow larger numbers at the cost of imprecise results.

long long combinations(int n, int k) {
    long long product = 1;
    for(int i = 1; i <= k; i++)
        product = product*(n - k + i)/i; // Must do mul before div
    return product;
}

The denominator will contain all the numbers from [1 .. k], while the numerator will contain all the numbers from [n - k + 1 .. n], noting that the first [1 .. n - k] factors are cancelled out by the (n - k)! factor on the denominator.

The order of how to perform these operations matter. It can be proven that (n - k + i) is divisible by i across all of the iteration steps, so none of the factors are off, as after i iterations, the numerator contains a product of i consecutive integers, which then implies the numerator is divisible by i.

-5

"Most efficient" is a poor request. What are you trying to make efficient? The stack? Memory? Speed? Overall, my opinion is that the recursive method is most efficient because it only uses addition (a cheap operation) and the recursion won't be too bad for most cases. The function is:

nchoosek(n, k)
{
    if(k==0) return 1;
    if(n==0) return 0;
    return nchoosek(n-1, k-1)+nchoosek(n-1,k);
}
kainaw
  • 4,256
  • 1
  • 18
  • 38
  • 1
    That's a tree recursion and for big values for n and k it might not finish at all. – Pedrom Mar 08 '13 at 19:53
  • 6
    It's `O(2^n)` time and `O(n)` space. Factorial computation is `O(n)` time and `O(1)` space. – SomeWittyUsername Mar 08 '13 at 19:54
  • I disagree. This is effectively computing Pascal's triangle; it's most definitely **NOT** `O(2^n)` - it's `O(n^2)`. This is a square, not a tree. Plus, it will never overflow if the result is storable in a long, and addition is much faster than multiplication and division. Even more, you can memoize with `f(n, k) = f(n, n-k)`, and all the edge cases are 1. – Andrew Mao Mar 08 '13 at 20:05
  • "the recursion won't be too bad for most cases"? that's a poor answer! in "most cases" it will perform really, really bad because of the tree recursion, for n in the range of 20's or 30's the function won't even stop in any reasonable amount of time – Óscar López Mar 08 '13 at 20:05
  • @AndrewMao That recursion is not a tail recursion and has two recursion calls, therefore the inner recursive process implies a tree recursion. This is not too much different from a fibonacci implementation. It could indeed result in a stack overflow if n and k are big enough. – Pedrom Mar 08 '13 at 20:09
  • Again, I strongly disagree. You can get an idea of why this is not a tree just by looking at Pascal's triangle. I agree that the OP did not address this but for practical purposes (i.e. not BigInteger) it will be faster than multiplication/division algorithms. – Andrew Mao Mar 08 '13 at 20:10
  • 2
    This IS most definitely exponential. The complexity of computing nchoosek(n,k) is nchoosek(n,k) at least, since your base cases are 0 and 1. If you do the same with dynamic programming, you'll get a n^2 complexity, here you're calculating the same results many times. – bruce_ricard Mar 08 '13 at 20:12
  • 2
    @AndrewMao Each call to this function results in 2 nodes in the recursion tree. The recursion stops after n steps (I assume k <= n but that doesn't matter in general case) ==> 2^n nodes in the tree, O(2^n) running time. Memoization is orthogonal to this method, the benefits of memoization aren't related to recursion. – SomeWittyUsername Mar 08 '13 at 20:12
  • I did not explain this in the answer, because I felt that it would make it more confusing... but this will use many of the same values on both sides of the addition. So, by using memoization, you effectively remove one side of the addition from the recursion. You take a hit in memory for a huge savings in recursion. – kainaw Mar 08 '13 at 20:29
  • @kainaw this has nothing to do with the recursion. Straightforward calculation of Pascal triangle (which implies memoization) will most likely perform better – SomeWittyUsername Mar 08 '13 at 20:32
  • 1
    This is a poor method even with memoization. C(10^12, 2) would require a trillion additions, but the multiplicative formula is instantaneous. – Dave Radcliffe Oct 25 '18 at 04:58