9

I need a way of calculating combinations without running out of memory. Here's what i have so far.

public static long combination(long n, long k) // nCk
{
    return (divideFactorials(factorial(n), ((factorial(k) * factorial((n - k))))));
}

public static long factorial(long n)
{
    long result;
    if (n <= 1) return 1;
    result = factorial(n - 1) * n;
    return result;
}

public static long divideFactorials(long numerator, long denominator)
{
    return factorial(Math.Abs((numerator - denominator)));
}

I have tagged it as C#, but the solution should ideally be language independent.

Kjuly
  • 34,476
  • 22
  • 104
  • 118
Nyx
  • 1,273
  • 6
  • 19
  • 32
  • 4
    These numbers are called "binomial coefficients" and as a very common object, have appeared before in SO: http://stackoverflow.com/q/4256188/422353 – madth3 Oct 19 '12 at 23:35
  • 1
    What kind of combination exactly are you trying to get? Is it simply nCk, or is it something else? I am just asking because your comment at the top says "nCk" but your code does not directly calculate that. – philkark Oct 19 '12 at 23:36
  • 3
    Yes, add this line to factorial(): `if (n > 20) throw new OverflowException();` – Hans Passant Oct 19 '12 at 23:42

6 Answers6

23

One of the best methods for calculating the binomial coefficient I have seen suggested is by Mark Dominus. It is much less likely to overflow with larger values for N and K than some other methods.

public static long GetBinCoeff(long N, long K)
{
   // This function gets the total number of unique combinations based upon N and K.
   // N is the total number of items.
   // K is the size of the group.
   // Total number of unique combinations = N! / ( K! (N - K)! ).
   // This function is less efficient, but is more likely to not overflow when N and K are large.
   // Taken from:  http://blog.plover.com/math/choose.html
   //
   long r = 1;
   long d;
   if (K > N) return 0;
   for (d = 1; d <= K; d++)
   {
      r *= N--;
      r /= d;
   }
   return r;
}
Bob Bryan
  • 3,687
  • 1
  • 32
  • 45
  • Since r is always non-negative, better to use ulong instead of long to allow larger coefficients to be calculated without overflow. – sean Jan 09 '18 at 17:13
  • Even better use BigInteger from System.Numerics. For example: GetBinCoeff(64,33) returns 1.77E18 when implemented with BigInteger (the correct answer) but 3.43E17 and -2.68E17 when implemented with ulong and long respectively. – sean Dec 03 '18 at 14:40
  • 1
    @sean In standard C# usage, unsigned integers are reserved for bit fields or other special uses so don't do that. Calculations are always performed signed. But of course, feel free to use BigInteger if needed. – relatively_random Mar 10 '19 at 13:58
9

Here is a solution which is very similar to Bob Byran, but checks two more preconditions to speed up the code.

    /// <summary>
    /// Calculates the binomial coefficient (nCk) (N items, choose k)
    /// </summary>
    /// <param name="n">the number items</param>
    /// <param name="k">the number to choose</param>
    /// <returns>the binomial coefficient</returns>
    public static long BinomCoefficient(long n, long k)
    {
        if (k > n) { return 0; }
        if (n == k) { return 1; } // only one way to chose when n == k
        if (k > n - k) { k = n - k; } // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
        long c = 1;
        for (long i = 1; i <= k; i++)
        {
            c *= n--;
            c /= i;
        }
        return c;
    }
Moop
  • 3,414
  • 2
  • 23
  • 37
  • The check `n == k` should probably not be there. It is an optimization for a statistically rare case. – Evg Apr 23 '19 at 19:14
7
public static long combination(long n, long k)
    {
        double sum=0;
        for(long i=0;i<k;i++)
        {
            sum+=Math.log10(n-i);
            sum-=Math.log10(i+1);
        }
        return (long)Math.pow(10, sum);
    }
Victor Mukherjee
  • 10,487
  • 16
  • 54
  • 97
  • 3
    Even though the original post uses longs, your return value should really be a double or long double. As you did the calculations using doubles, there is really no point in converting it back into a long, as the answer is not necessarily 100% exact. Also it limits your values for n and k even more. – philkark Oct 20 '12 at 00:26
  • 5
    this is not a good solution. for example, combination(52,5) should return 2598960, not 2598959. The Mark Dominus one is much better. – sapbucket Nov 05 '15 at 23:09
  • 1
    This is bad solution. Consider combination(7,7), combination(10,10) which should be 1, not zero. This is even worse, as it yields correct 1 for combination(1,1), combination(2,2) up to seven, giving false hope. This is obwiously caused by the casting to long, and the loss of precision for some numbers. Making the function return double and removing the cast will fix this, but the precission will still be not optimal. – Piotr Falkowski May 22 '16 at 17:22
  • This would be the best solution though (when the long casting removed and result returned as double) when the numbers are greater than couple of thousends, as Mark Dominus solution overflows then. – Piotr Falkowski May 22 '16 at 17:40
3

Looking at your code, it is no wonder that you will run out of memory quite fast. Your method divideFactorials calls the method factorial and uses as argument the difference "numerator-denominator". That difference is very likely going to be very large according to your code and you will be stuck in a very long loop in your factorial method.

If it is really just about finding nCk (which I assume because your comment in your code), just use:

public static long GetnCk(long n, long k)
{
    long bufferNum = 1;
    long bufferDenom = 1;

    for(long i = n; i > Math.Abs(n-k); i--)
    {
        bufferNum *= i;
    }

    for(long i = k; i => 1; i--)
    {
        bufferDenom *= i;
    }

    return (long)(bufferNom/bufferDenom);
}

Of course using this method you will run out of range very fast, because a long does not actually support very long numbers, so n and k have to be smaller than 20.

Supposing that you actually work with very large numbers you could use doubles instead of longs, as the powers become more and more significant.

Edit: If you use large numbers you could also use Stirling's Formula:

As n becomes large ln(n!) -> n*ln(n) - n.

Putting this into code:

public static double GetnCk(long n, long k)
{
    double buffern = n*Math.Log(n) - n;
    double bufferk = k*Math.Log(k) - k;
    double bufferkn = Math.Abs(n-k)*Math.Log(Math.Abs(n-k)) - Math.Abs(n-k);

    return Math.Exp(buffern)/(Math.Exp(bufferk)*Math.Exp(bufferkn));
}

I only propose this answer, as you said language independent, the C# code is just used to demonstrate it. Since you need to use large numbers for n and k for this to work, i propose this as a general way for finding the binomial coefficient for large combinations.

For cases were n and k are both smaller than around 200-300, you should use the answer Victor Mukherjee proposed, as it is exact.

Edit2: Edited my first code.

phant0m
  • 16,595
  • 5
  • 50
  • 82
philkark
  • 2,417
  • 7
  • 38
  • 59
  • I tried out Victor's answer to about 20, 000 iterations and it ran perfectly. However, I did run out of memory at that range. If I need a greater range, I'll probably use this. Thank you for your answer. – Nyx Oct 20 '12 at 00:50
  • @Marv Why would you run out of memory? It's not recursive and there are no datastructures involved. – phant0m Oct 20 '12 at 12:03
  • @phant0m You're right. I create several data structures on every iteration. I guess the choice of algorithm won't change a thing, except maybe the time it takes. – Nyx Oct 20 '12 at 17:15
  • In 5th line of second snippet in Math.Log(Math.Abs(n-k)) there is a problem, when n and k are equal the -Infinity (result of Log(0)) propagates to NaN. – Piotr Falkowski May 22 '16 at 18:10
1

Just for the sake of completion: the standard C math library has implementations of both Γ and lnΓ (called tgamma and lgamma), where

Γ(n) &equals; (n-1)!

The library computation is certainly faster and more accurate than summing logarithms. For a lot more information, see Wikipedia and Mathworld.

rici
  • 234,347
  • 28
  • 237
  • 341
1

Using some functions from the responses in this question, I created a list of function to do the calculation and I ran them each at different values 20000 times to see how well they worked. You should be able to combine this with some caching of factorials to use far larger values. Heres what i said in the other post located- https://stackoverflow.com/a/71789681/4285191

After hitting a similar problem, I decided to compile the best solutions if have seen and run a simple test on each for a couple different values of n and k. I started with 10 or so functions and weeded out the ones that were just plain wrong or stopped working at specific values. Of all of the solutions, the answer above by user448810 is the cleanest and easiest to implement, I quite like it. Below will be code including each test i ran, the number of times i ean each function for each test, each functions code, the functions output and the time it took to get that output. I only did 20000 runs, there were still fluctuations in the time if i re ran the tests, but you should get a general sense of how well each worked.

//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //EXPECTED VALUES
    //x choose x = 1
    //9 choose 4 =126
    //52 choose 5 = 2598960;
    //200 choose 5 = 2535650040;
    //64 choose 33 = 1.777090076065542336E18;

    //# of runs for each test: 20000
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//https://stackoverflow.com/a/12983878/4285191
public static double combination(int n, int k) {
    double sum=0;
    for(long i=0;i<k;i++) {
        sum+=Math.Log10(n-i);
        sum-=Math.Log10(i+1);
    }
    return Math.Pow(10, sum);
}
/*
    4 choose 4
    1
    Elapsed Time is 94349 ticks
    9 choose 4
    126
    Elapsed Time is 94534 ticks
    52 choose 5
    2598959.99999999
    Elapsed Time is 112340 ticks
    200 choose 5
    2535650040
    Elapsed Time is 113253 ticks
    64 choose 33
    1.77709007606551E+18
    Elapsed Time is 717255 ticks
*/


//........................................................
//https://stackoverflow.com/a/19125294/4285191
public static double BinomCoefficient(int n, int k) {
    if (k > n) return 0;
    if (n == k) return 1; // only one way to chose when n == k
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double c = 1;
    for (long i = 1; i <= k; i++) {
        c *= n--;
        c /= i;
    }
    return c;
}
/*
    4 choose 4
    1
    Elapsed Time is 6181 ticks
    9 choose 4
    126
    Elapsed Time is 16989 ticks
    52 choose 5
    2598960
    Elapsed Time is 20636 ticks
    200 choose 5
    2535650040
    Elapsed Time is 20659 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 131757 ticks
*/

//........................................................
//https://stackoverflow.com/a/15302448/4285191
public static double choose(int n, int k) {
    if (k == 0) return 1;
    return (n * choose(n-1, k-1)) / k;
}
/*
    4 choose 4
    1
    Elapsed Time is 24789 ticks
    9 choose 4
    126
    Elapsed Time is 14157 ticks
    52 choose 5
    2598960
    Elapsed Time is 17671 ticks
    200 choose 5
    2535650040
    Elapsed Time is 17827 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 134960 ticks
*/

//........................................................
//My own version which is just a mix of the two best above.
public static double binomialCoeff(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double recusion(long n, long k) {
        if (k == 0) return 1; // only one way to chose when n == k
        return (n * recusion(n-1, k-1)) / k;
    }
    return recusion(n,k);
}
/*
    4 choose 4
    1
    Elapsed Time is 6968 ticks
    9 choose 4
    126
    Elapsed Time is 19756 ticks
    52 choose 5
    2598960
    Elapsed Time is 25528 ticks
    200 choose 5
    2535650040
    Elapsed Time is 26350 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 156224 ticks
*/

//........................................................
//https://en.wikipedia.org/wiki/Binomial_coefficient
public static double binomial(int n, int k) {
    // take advantage of symmetry
    if (k > n-k)  k = n-k;

    long c = 1;
    for (long i = 1; i <= k; i++, n--) {
        // return 0 on potential overflow
        if (c/i >= long.MaxValue/n) return 0;
        // split c * n / i    into (c / i * i) + (c % i * n / i)
        c = (c / i * n) + (c % i * n / i); 
    }

    return c;
}

/*
    4 choose 4
    1
    Elapsed Time is 7661 ticks
    9 choose 4
    126
    Elapsed Time is 46471 ticks
    52 choose 5
    2598960
    Elapsed Time is 59829 ticks
    200 choose 5
    2535650040
    Elapsed Time is 62840 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 458198 ticks
*/
    
    
    
    
    
    
    
    
    
    
public static double BinomialCoefficient(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so use the smaller k.
    double recusion(long n, long k) => ((k==0)? 1 : ((n * recusion(n-1, k-1)) / k));
    return recusion(n,k);
}
/*
    4 choose 4
    1
    Elapsed Time is 5860 ticks
    9 choose 4
    126
    Elapsed Time is 22123 ticks
    52 choose 5
    2598960
    Elapsed Time is 25825 ticks
    200 choose 5
    2535650040
    Elapsed Time is 27490 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 161723 ticks
*/


//This is by far the fastest way to find the BC based on multiple tests. 
public static double BinomialCoefficient2(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so use the smaller k.
    
    double val= 1;
    for (int rk=1, rn=n-k+1; rk<=k; ++rk,++rn) {
        val = (rn)*val /(rk);
    }
    return val;
}
    
/*
    4 choose 4
    1
    Elapsed Time is 5824 ticks
    9 choose 4
    126
    Elapsed Time is 7983 ticks
    52 choose 5
    2598960
    Elapsed Time is 9820 ticks
    200 choose 5
    2535650040
    Elapsed Time is 10157 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 73985 ticks
*/
    
    
ZXYNINE
  • 712
  • 4
  • 5