11

I have some performance sensitive code on a Node.js server that needs to count combinations. From this SO answer, I used this simple recursive function for computing n choose k:

function choose(n, k) {
    if (k === 0) return 1;
    return (n * choose(n-1, k-1)) / k;
}

Then since we all know iteration is almost always faster than recursion, I wrote this function based on the multiplicative formula:

function choosei(n,k){
    var result = 1;
    for(var i=1; i <= k; i++){
        result *= (n+1-i)/i;
    }
    return result;
}

I ran a few benchmarks on my machine. Here are the results of just one of them:

Recursive x 178,836 ops/sec ±7.03% (60 runs sampled)
Iterative x 550,284 ops/sec ±5.10% (51 runs sampled)
Fastest is Iterative

The results consistently showed that the iterative method is indeed about 3 to 4 times faster than the recursive method in Node.js (at least on my machine).

This is probably fast enough for my needs, but is there any way to make it faster? My code has to call this function very frequently, sometimes with fairly large values of n and k, so the faster the better.

EDIT

After running a few more tests with le_m's and Mike's solutions, it turns out that while both are significantly faster than the iterative method I proposed, Mike's method using Pascal's triangle appears to be slightly faster than le_m's log table method.

Recursive x 189,036 ops/sec ±8.83% (58 runs sampled)
Iterative x 538,655 ops/sec ±6.08% (51 runs sampled)
LogLUT x 14,048,513 ops/sec ±9.03% (50 runs sampled)
PascalsLUT x 26,538,429 ops/sec ±5.83% (62 runs sampled)
Fastest is PascalsLUT

The logarithmic look up method has been around 26-28 times faster than the iterative method in my tests, and the method using Pascal's triangle has been about 1.3 to 1.8 times faster than the logarithmic look up method.

Note that I followed le_m's suggestion of pre-computing the logarithms with higher precision using mathjs, then converted them back to regular JavaScript Numbers (which are always double-precision 64 bit floats).

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
NanoWizard
  • 2,104
  • 1
  • 21
  • 34
  • 2
    Memoization can be fastest choice, if space is not your concern. – Godfather Jun 07 '16 at 12:57
  • From the comment on the answer you linked and from the [wikipedia page](https://en.wikipedia.org/wiki/Binomial_coefficient#Factorial_formula): `if (k > n/2) return choose(n, n-k);` - which will help when both `n` and `k` are large, but the additional branch might slow down overall execution. – Bergi Jun 07 '16 at 13:16
  • 2
    @Godfather @NanoWizard and If you dont have the space to momoize the choosei results, you can get good performance increases by momoizing the factorials. You can even have a dynamic programming technique to compute the factorial `fact(n) = max_known_fact_value(n) * [i for evey int for max_known_fact_int(n) to n]` that would save significant time – gbtimmon Jun 07 '16 at 13:25
  • yeah @gbtimmon that will be great! – Godfather Jun 07 '16 at 13:39
  • build pascal's triangle. It is much faster. – Mike 'Pomax' Kamermans Jun 09 '16 at 03:02
  • @Mike'Pomax'Kamermans Why build the triangle when you just need the (log)-factorials for an O(1) solution? – le_m Jun 09 '16 at 03:03
  • @le_m you might have misunderstood the instruction: build the triangle, save the LUT, use that LUT. If you need to, at any point during the runtime of your application, actually *compute* binomial terms, you are wasting time. Those values are constants that can simply be included and consulted in O(1). – Mike 'Pomax' Kamermans Jun 09 '16 at 03:25
  • @Mike'Pomax'Kamermans I don't think I misunderstood: A pascal triangle LUT has an O(n²) space requirement which might be problematic. – le_m Jun 09 '16 at 03:36
  • 1
    Not for practical purposes though. Using UINT16, you can store the LUT up to n=19 in 380 bytes + array overhead. UINT32 allows for up to n=35 in a fraction over 2kb. UINT64 allows up to n=68 and now we've properly shot beyond "do you honestly believe you need binomial coefficients this high?" values at a fraction over 18kb. These are mallocs that no one in their right mind would care about if they already committed to doing binomial computation. Especially in Node.js, where these LUTs are insignificant compared to even base Node's memory footprint. – Mike 'Pomax' Kamermans Jun 09 '16 at 03:42
  • just noticed that last edit: remember that the largest true _integer_ in JS is only 2^53. After that, `(v - (v + n))` is no longer guaranteed to be `n`. For instance: `a = 2**53; b = 2**53 + 1; b - a` is zero in JS, not 1. – Mike 'Pomax' Kamermans Apr 11 '20 at 16:13

3 Answers3

14

The following algorithm has a run-time complexity of O(1) given a linear look-up table of log-factorials with space-complexity O(n).

Limiting n and k to the range [0, 1000] makes sense since binomial(1000, 500) is already dangerously close to Number.MAX_VALUE. We would thus need a look-up table of size 1000.

On a modern JavaScript engine, a compact array of n numbers has a size of n * 8 bytes. A full look-up table would thus require 8 kilobytes of memory. If we limit our input to the range [0, 100], the table would only occupy 800 bytes.

var logf = [0, 0, 0.6931471805599453, 1.791759469228055, 3.1780538303479458, 4.787491742782046, 6.579251212010101, 8.525161361065415, 10.60460290274525, 12.801827480081469, 15.104412573075516, 17.502307845873887, 19.987214495661885, 22.552163853123425, 25.19122118273868, 27.89927138384089, 30.671860106080672, 33.50507345013689, 36.39544520803305, 39.339884187199495, 42.335616460753485, 45.38013889847691, 48.47118135183523, 51.60667556776438, 54.78472939811232, 58.00360522298052, 61.261701761002, 64.55753862700634, 67.88974313718154, 71.25703896716801, 74.65823634883016, 78.0922235533153, 81.55795945611504, 85.05446701758152, 88.58082754219768, 92.1361756036871, 95.7196945421432, 99.33061245478743, 102.96819861451381, 106.63176026064346, 110.32063971475739, 114.0342117814617, 117.77188139974507, 121.53308151543864, 125.3172711493569, 129.12393363912722, 132.95257503561632, 136.80272263732635, 140.67392364823425, 144.5657439463449, 148.47776695177302, 152.40959258449735, 156.3608363030788, 160.3311282166309, 164.32011226319517, 168.32744544842765,  172.3527971391628, 176.39584840699735, 180.45629141754378, 184.53382886144948, 188.6281734236716, 192.7390472878449, 196.86618167289, 201.00931639928152, 205.1681994826412, 209.34258675253685, 213.53224149456327, 217.73693411395422, 221.95644181913033, 226.1905483237276, 230.43904356577696, 234.70172344281826, 238.97838956183432, 243.2688490029827, 247.57291409618688, 251.8904022097232, 256.22113555000954, 260.5649409718632, 264.9216497985528, 269.2910976510198, 273.6731242856937, 278.0675734403661, 282.4742926876304, 286.893133295427, 291.3239500942703, 295.76660135076065, 300.22094864701415, 304.6868567656687, 309.1641935801469, 313.65282994987905, 318.1526396202093, 322.66349912672615, 327.1852877037752, 331.7178871969285, 336.26118197919845, 340.815058870799, 345.37940706226686, 349.95411804077025, 354.5390855194408, 359.1342053695754, 363.73937555556347];

function binomial(n, k) {
    return Math.exp(logf[n] - logf[n-k] - logf[k]);
}

console.log(binomial(5, 3));

Explanation

Starting with the original iterative algorithm, we first replace the product with a sum of logarithms:

function binomial(n, k) {
    var logresult = 0;
    for (var i = 1; i <= k; i++) {
        logresult += Math.log(n + 1 - i) - Math.log(i);
    }
    return Math.exp(logresult);
}

Our loop now sums over k terms. If we rearrange the sum, we can easily see that we sum over consecutive logarithms log(1) + log(2) + ... + log(k) etc. which we can replace by a sum_of_logs(k) which is actually identical to log(k!). Precomputing these values and storing them in our lookup-table logf then leads to the above one-liner algorithm.

Computing the look-up table:

I recommend precomputing the lookup-table with higher precision and converting the resulting elements to 64-bit floats. If you do not need that little bit of additional precision or want to run this code on the client side, use this:

var size = 1000, logf = new Array(size);
logf[0] = 0;
for (var i = 1; i <= size; ++i) logf[i] = logf[i-1] + Math.log(i);

Numerical precision:

By using log-factorials, we avoid precision problems inherent to storing raw factorials.

We could even use Stirling's approximation for log(n!) instead of a lookup-table and still get 12 significant figures for above computation in both run-time and space complexity O(1):

function logf(n) {
    return n === 0 ? 0 : (n + .5) * Math.log(n) - n + 0.9189385332046728 + 0.08333333333333333 / n - 0.002777777777777778 * Math.pow(n, -3);
}

function binomial(n , k) {
    return Math.exp(logf(n) - logf(n - k) - logf(k));
}

console.log(binomial(1000, 500)); // 2.7028824094539536e+299
akrolsmir
  • 312
  • 1
  • 4
  • 14
le_m
  • 19,302
  • 9
  • 64
  • 74
  • Arithmetic operations are performed on 64-bit floating point values, thus you would need to e. g. format the result before printing by truncating insignificant fractional digits. – le_m Jun 09 '16 at 03:29
  • I think this is a typo - `log(1) + log(2) + ... + log(k)` should be `log(k!)` instead of `log(n!)`. – Boatmarker Oct 19 '19 at 13:34
14

Never compute factorials, they grow too quickly. Instead compute the result you want. In this case, you want the binomial numbers, which have an incredibly simple geometric construction: you can build pascal's triangle, as you need it, and do it using plain arithmetic.

Start with [1] and [1,1]. The next row has [1] at the start, [1+1] in the middle, and [1] at the end: [1,2,1]. Next row: [1] at the start, the sum of the first two terms in spot 2, the sum of the next two terms in spot three, and [1] at the end: [1,3,3,1]. Next row: [1], then 1+3=4, then 3+3=6, then 3+1=4, then [1] at the end, and so on and so on. As you can see, no factorials, logarithms, or even multiplications: just super fast addition with clean integer numbers. So simple, you can build a massive lookup table by hand.

And you should.

Never compute in code what you can compute by hand and just include as constants for immediate lookup; in this case, writing out the table for up to something around n=20 is absolutely trivial, and you can then just use that as your "starting LUT" and probably never even access the high rows.

But, if you do need them, or more, then because you can't build an infinite lookup table you compromise: you start with a pre-specified LUT, and a function that can "fill it up" to some term you need that's not in it yet:

// step 1: a basic LUT with a few steps of Pascal's triangle
const binomials = [
  [1],
  [1,1],
  [1,2,1],
  [1,3,3,1],
  [1,4,6,4,1],
  [1,5,10,10,5,1],
  [1,6,15,20,15,6,1],
  [1,7,21,35,35,21,7,1],
  [1,8,28,56,70,56,28,8,1],
  //  ...
];

// step 2: a function that builds out the LUT if it needs to.
module.exports = function binomial(n,k) {
  while(n >= binomials.length) {
    let s = binomials.length;
    let nextRow = [];
    nextRow[0] = 1;
    for(let i=1, prev=s-1; i<s; i++) {
      nextRow[i] = binomials[prev][i-1] + binomials[prev][i];
    }
    nextRow[s] = 1;
    binomials.push(nextRow);
  }
  return binomials[n][k];
};

Since this is an array of ints, the memory footprint is tiny. For a lot of work involving binomials, we realistically don't even need more than two bytes per integer, making this a minuscule lookup table: we don't need more than 2 bytes until you need binomials higher than n=19, and the full lookup table up to n=19 takes up a measly 380 bytes. This is nothing compared to the rest of your program. Even if we allow for 32 bit ints, we can get up to n=35 in a mere 2380 bytes.

So the lookup is fast: either O(constant) for previously computed values, (n*(n+1))/2 steps if we have no LUT at all (in big O notation, that would be O(n²), but big O notation is almost never the right complexity measure), and somewhere in between for terms we need that aren't in the LUT yet. Run some benchmarks for your application, which will tell you how big your initial LUT should be, simply hard code that (seriously. these are constants, they're are exactly the kind of values that should be hard coded), and keep the generator around just in case.

However, do remember that you're in JavaScript land, and you are constrained by the JavaScript numerical type: integers only go up to 2^53, beyond that the integer property (every n has a distinct m=n+1 such that m-n=1) is not guaranteed. This should hardly ever be a problem, though: once we hit that limit, we're dealing with binomial coefficients that you should never even be using.

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • 1
    There is an error in the last row of the lut array. For anyone copy pasting, replace 54 with 56 or just remove that line. Lazy mans proof: https://www.google.com/search?q=8+choose+3 – Basic Block Jun 09 '18 at 11:59
  • I didn't realize you would see the comment, but good you fixed it. – Basic Block Jun 09 '18 at 19:18
  • 1
    For those who like a bit of JS madness: this is the part after `let s = ...` in the while loop, as a one-liner: `binomials.push(new Array(s+1).fill(0).map((_, i) => i == 0 || i == s ? 1 : binomials[s-1][i-1] + binomials[s-1][i]));` – sigalor Sep 16 '18 at 00:10
  • 1
    @sigalor no need for `new Array(s+1).fill(0).map()`, just use modern JS: `[...Array(s+1)].map()`. If you're trying to micro-optimize in a way that doesn't actually execute any faster anyway =) – Mike 'Pomax' Kamermans Sep 16 '18 at 00:28
  • 1
    And as a several-years-later comment, we now have Uint8Array and Uint16Array, which *are* sized arrays, so you can now _force_ your memory footprint using `new Uint8Array(rowsize)` or the (completely overkill) `new Uint16Array(rowsize)` – Mike 'Pomax' Kamermans Sep 20 '20 at 22:31
-1

Using Pascal's triangle is a fast method for calculating n choose k.

The fastest method I know of would be to make use of the results from "On the Complexity of Calculating Factorials". Just calculate all 3 factorials, then perform the two division operations, each with complexity M(n logn).

  • There is no "just" in "just calculate a factorial": calculating `n` factorial requires `n-1` multiplications, something which cpus are _incredibly_ bad at compared to addition. The paper you're citing talks about complexity in terms of `n` only, not in terms of cpu operations, which is ostensibly what the question was about. It asked for efficient computation. – Mike 'Pomax' Kamermans Jun 05 '23 at 16:16