16

I want to pre-compute some values for each combination in a set of combinations. For example, when choosing 3 numbers from 0 to 12, I'll compute some value for each one:

>>> for n in choose(range(13), 3):
    print n, foo(n)

(0, 1, 2) 78
(0, 1, 3) 4
(0, 1, 4) 64
(0, 1, 5) 33
(0, 1, 6) 20
(0, 1, 7) 64
(0, 1, 8) 13
(0, 1, 9) 24
(0, 1, 10) 85
(0, 1, 11) 13
etc...

I want to store these values in an array so that given the combination, I can compute its and get the value. For example:

>>> a = [78, 4, 64, 33]
>>> a[magic((0,1,2))]
78

What would magic be?

Initially I thought to just store it as a 3-d matrix of size 13 x 13 x 13, so I can easily index it that way. While this is fine for 13 choose 3, this would have way too much overhead for something like 13 choose 7.

I don't want to use a dict because eventually this code will be in C, and an array would be much more efficient anyway.

UPDATE: I also have a similar problem, but using combinations with repetitions, so any answers on how to get the rank of those would be much appreciated =).

UPDATE: To make it clear, I'm trying to conserve space. Each of these combinations actually indexes into something take up a lot of space, let's say 2 kilobytes. If I were to use a 13x13x13 array, that would be 4 megabytes, of which I only need 572 kilobytes using (13 choose 3) spots.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • 3
    In permutations, combinations and partitions, the literature term is "rank" rather than "index." Search "rank combination algorithm". :) This is a really good page: http://home.hccnet.nl/david.dirkse/math/rank/ranking.html – Heath Hunnicutt Jun 29 '10 at 17:33
  • When you say "I don't want to use a dict" ... does it mean that you don't want to use a hash table? – Dr. belisarius Jun 29 '10 at 17:33
  • @belisarius: yep, sorry for the python terminology – Claudiu Jun 29 '10 at 17:42
  • If you want to pre-compute, you'll always end up with "M Choose N" elements. If you want a structure other than an array, you'll have to implement at least 1 pointer per node. So, why do you think that there is a better approach than an array? – Dr. belisarius Jun 29 '10 at 17:46
  • @belisarius: an array is fine. i just don't want a 13 x 13 x 13 array when I only need a 286-element array. – Claudiu Jun 29 '10 at 17:56

7 Answers7

13

Here is a conceptual answer and a code based on how lex ordering works. (So I guess my answer is like that of "moron", except that I think that he has too few details and his links have too many.) I wrote a function unchoose(n,S) for you that works assuming that S is an ordered list subset of range(n). The idea: Either S contains 0 or it does not. If it does, remove 0 and compute the index for the remaining subset. If it does not, then it comes after the binomial(n-1,k-1) subsets that do contain 0.

def binomial(n,k):
    if n < 0 or k < 0 or k > n: return 0
    b = 1
    for i in xrange(k): b = b*(n-i)/(i+1)
    return b

def unchoose(n,S):
    k = len(S)
    if k == 0 or k == n: return 0
    j = S[0]
    if k == 1: return j
    S = [x-1 for x in S]
    if not j: return unchoose(n-1,S[1:])
    return binomial(n-1,k-1)+unchoose(n-1,S)

def choose(X,k):
    n = len(X)
    if k < 0 or k > n: return []
    if not k: return [[]]
    if k == n: return [X]
    return [X[:1] + S for S in choose(X[1:],k-1)] + choose(X[1:],k)

(n,k) = (13,3)
for S in choose(range(n),k): print unchoose(n,S),S

Now, it is also true that you can cache or hash values of both functions, binomial and unchoose. And what's nice about this is that you can compromise between precomputing everything and precomputing nothing. For instance you can precompute only for len(S) <= 3.

You can also optimize unchoose so that it adds the binomial coefficients with a loop if S[0] > 0, instead of decrementing and using tail recursion.

Greg Kuperberg
  • 3,995
  • 22
  • 24
  • ah awesome, makes a lot of sense! would you happen to know a solution for combinations with repetitions? e.g. (0,0,0),(0,0,1),(0,0,2),...,(0,1,1),(0,1,2), etc... – Claudiu Jun 29 '10 at 19:23
  • 2
    Combinations with repetitions are an equivalent problem. First, you have the formula multibinomial(n,k) = binomial(n+k-1,k). Second, you can divide the combinations into two kinds, those that use 0 and come first, and those that do not use 0 and come after the multibinomial(n,k-1) combinations that do. The code would be very similar and I won't post it. (In fact, there is a standard bijection, called "stars and bars", between (n,k) combinations with repetitions and (n+k-1,k) combinations without repetitions. It preserves lex ordering.) – Greg Kuperberg Jun 29 '10 at 19:39
  • I think I can figure it out from there - thanks for the clear answer! You explained this in 8 lines of code and a few sentences much better than that entire article. – Claudiu Jun 29 '10 at 19:44
4

You can try using the lexicographic index of the combination. Maybe this page will help: http://saliu.com/bbs/messages/348.html

This MSDN page has more details: Generating the mth Lexicographical Element of a Mathematical Combination.

NOTE: The MSDN page has been retired. If you download the documentation at the above link, you will find the article on page 10201 of the pdf that is downloaded.

To be a bit more specific:

When treated as a tuple, you can order the combinations lexicographically.

So (0,1,2) < (0,1,3) < (0,1,4) etc.

Say you had the number 0 to n-1 and chose k out of those.

Now if the first element is zero, you know that it is one among the first n-1 choose k-1.

If the first element is 1, then it is one among the next n-2 choose k-1.

This way you can recursively compute the exact position of the given combination in the lexicographic ordering and use that to map it to your number.

This works in reverse too and the MSDN page explains how to do that.

Zodman
  • 3,178
  • 2
  • 32
  • 38
  • +1 I've never seen it explained as well as it is on the msdn page (I never would have thought to search for something like this there either). This way he could use the lexicographic index as an array index and practically get a perfect hash. – IVlad Jun 29 '10 at 17:46
  • @IVlad: Yes, I was surprised that to find that on MSDN! –  Jun 29 '10 at 17:48
  • Hmm it doesn't seem to work. e.g. (0, 1, 4) should have rank 2: (0,1,2),(0,1,3),(0,1,4), but doing (4 choose 3) + (1 choose 2) + (0 choose 1) gives 4..? – Claudiu Jun 29 '10 at 18:23
  • You don't add the numbers you get in the intermediate. If it is the smallest, you add a zero (i.e. do nothing). It might get tricky, though. –  Jun 29 '10 at 18:54
  • I've rehosted the MSDN article on my blog, at the end of my latest post about the mathematics and algorithms of k-combinations (all code included): https://www.redperegrine.net/2021/04/10/software-algorithms-for-k-combinations/ – Zodman Apr 10 '21 at 00:06
2

Use a hash table to store the results. A decent hash function could be something like:

h(x) = (x1*p^(k - 1) + x2*p^(k - 2) + ... + xk*p^0) % pp

Where x1 ... xk are the numbers in your combination (for example (0, 1, 2) has x1 = 0, x2 = 1, x3 = 2) and p and pp are primes.

So you would store Hash[h(0, 1, 2)] = 78 and then you would retrieve it the same way.

Note: the hash table is just an array of size pp, not a dict.

IVlad
  • 43,099
  • 13
  • 111
  • 179
  • I was wondering that myself. That's why the self-defence edit to my answer, which is obviously very similar to yours. –  Jun 29 '10 at 17:47
  • No idea for the downvote. Seems reasonably fine, except that you probably need to find p >= n (pp could be smaller I suppose). –  Jun 29 '10 at 17:50
2

I would suggest a specialised hash table. The hash for a combination should be the exclusive-or of the hashes for the values. Hashes for values are basically random bit-patterns.

You could code the table to cope with collisions, but it should be fairly easy to derive a minimal perfect hash scheme - one where no two three-item combinations give the same hash value, and where the hash-size and table-size are kept to a minimum.

This is basically Zobrist hashing - think of a "move" as adding or removing one item of the combination.

EDIT

The reason to use a hash table is that the lookup performance O(n) where n is the number of items in the combination (assuming no collisions). Calculating lexicographical indexes into the combinations is significantly slower, IIRC.

The downside is obviously the up-front work done to generate the table.

  • I don't agree that lexicographic index generation will be _significantly_ slower than hash. If you have a lookup table of N choose K, finding the lexicographic index is O(k) too and could actually be faster, but who knows, till we measure :-) In fact, we probably don't even need the lookup table if we do it cleverly. –  Jun 29 '10 at 17:52
  • OK - I confess, I assumed calculating the rank was slower that it is. I should have checked first. –  Jun 29 '10 at 18:23
  • @Moron: Not really - I didn't say it, but I was thinking the complexity would be something like O(nm) or O(n log m) where m is the number of items to choose from. I guess I could argue that the arithmetic ops are O(log m) - thinking of the needed bit-width for huge sets. Not exactly a *sane* argument, though. –  Jun 29 '10 at 19:12
  • I've thought of a (slightly) sane justification for this answer. To calculate the lexicographic rank, you need your selection of items to be sorted. Otherwise, you'd need non-constant-time checks to account for the gaps in your range left by already-handled items. The hash method doesn't need this. The lexicographic rank is therefore `O(n log n)` (due to the sort) if the items aren't already sorted. Counting in Discworld troll, given that n is neither 1 nor 2 but many, then `n log n` must be lots. See - entirely sane ;-) –  Jun 29 '10 at 19:52
  • @Steve314: Heh yep, that's true. I solved that particular issue in my hackish way by precomputing the index for any ordering of elements.. – Claudiu Jun 30 '10 at 05:16
1

I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:

  1. Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.

  2. Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration and it does not use very much memory. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.

  3. Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.

  4. Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.

  5. The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.

  6. There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

It should not be hard to convert this class to C++.

Bob Bryan
  • 3,687
  • 1
  • 32
  • 45
1

For now, I've reached a compromise: I have a 13x13x13 array which just maps to the index of the combination, taking up 13x13x13x2 bytes = 4 kilobytes (using short ints), plus the normal-sized (13 choose 3) * 2 kilobytes = 572 kilobytes, for a total of 576 kilobytes. Much better than 4 megabytes, and also faster than a rank calculation!

I did this partly cause I couldn't seem to get Moron's answer to work. Also this is more extensible - I have a case where I need combinations with repetitions, and I haven't found a way to compute the rank of those, yet.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
1

What you want are called combinadics. Here's my implementation of this concept, in Python:

def nthresh(k, idx):
  """Finds the largest value m such that C(m, k) <= idx."""
  mk = k
  while ncombs(mk, k) <= idx:
    mk += 1
  return mk - 1


def idx_to_set(k, idx):
  ret = []
  for i in range(k, 0, -1):
    element = nthresh(i, idx)
    ret.append(element)
    idx -= ncombs(element, i)
  return ret


def set_to_idx(input):
  ret = 0
  for k, ck in enumerate(sorted(input)):
    ret += ncombs(ck, k + 1)
  return ret
Nick Johnson
  • 100,655
  • 16
  • 128
  • 198