4

Given a bit array of fixed length and the number of 0s and 1s it contains, how can I arrange all possible combinations such that returning the i-th combinations takes the least possible time? It is not important the order in which they are returned.

Here is an example:

array length = 6
number of 0s = 4
number of 1s = 2

possible combinations (6! / 4! / 2!) 000011 000101 000110 001001 001010 001100 010001 010010 010100 011000 100001 100010 100100 101000 110000

problem 1st combination = 000011 5th combination = 001010 9th combination = 010100

With a different arrangement such as 100001 100010 100100 101000 110000 001100 010001 010010 010100 011000 000011 000101 000110 001001 001010

it shall return 1st combination = 100001 5th combination = 110000 9th combination = 010100

Currently I am using a O(n) algorithm which tests for each bit whether it is a 1 or 0. The problem is I need to handle lots of very long arrays (in the order of 10000 bits), and so it is still very slow (and caching is out of the question). I would like to know if you think a faster algorithm may exist.

Thank you

nalply
  • 26,770
  • 15
  • 78
  • 101
Flavio
  • 846
  • 1
  • 9
  • 21
  • Perhaps this answer will help you: http://stackoverflow.com/a/311716/220060 – nalply Aug 18 '12 at 14:16
  • How many of the combinations do you ultimately have to return? – David Aug 18 '12 at 14:26
  • Just a small set of all the possibilities (of course, as they are 2^10000), but even if they are just 1000 it takes lot of time. – Flavio Aug 18 '12 at 15:13
  • Hmmm. I don't think I understand the question. If it doesn't matter which order the set is in, then, in your example, can't you return any six-bit value with 2-bits set as the 1st combination, and any six-bit value with 2-bits set (that you haven't already chosen) as the 5th combination, etc.? – David Aug 18 '12 at 15:38
  • Thank you nalply, but I can't understand how that code might be relevant to this problem. My problem is that I have a set (as described above), and I need to extract exactly the i-th and j-th and n-th combinations. It's not important what the i-th or j-th is, as long as the algorithm is consistent (that is if I ask for the i-th it always return the same) and possibly runs faster than O(n) (maybe O(ln n) or even O(1)) – Flavio Aug 18 '12 at 15:42
  • David, order is not important but it must always return the same combination when I ask for the i-th combination. If I write a=combination(i), b=combination(j) or b=combination(j), a=combination(i), then a and b must end up with the same values in both cases – Flavio Aug 18 '12 at 15:50
  • Some useful information here: http://stackoverflow.com/questions/5878768/determine-whether-a-symbol-is-part-of-the-ith-combination-ncr – MBo Aug 18 '12 at 16:19
  • What language are you working in? – David Aug 18 '12 at 17:26
  • Must say, I'm a bit puzzled also. So you don't need to **enumerate every** combination, but only to **access** a small subset of these (let's say N of them). But since you say **order** doesn't matter, just repeatability, then it seems that all you'd need is a deterministic _method_ of enumeration (of which there are several) and to run it N times to generate the first N cases. So what are we missing? (Is your underlying problem a sampling problem? Are you having to draw samples repeatedly from the set of all enumerations but with different N?) – Assad Ebrahim Aug 18 '12 at 17:29
  • I probably need to state the problem more crearly. Let's say I have 24 bits, 2^24 combinations. Of these, consider only those with five 1s, that are (24! / 5! / 19!) = 42504. Assuming they are ordered, how can I tell in O(1) which combination is the 2978th? (I'm working in C++) – Flavio Aug 18 '12 at 17:57
  • The answer to that depends _very_ much on what the ordering is. I gave an answer below that will be very fast and will work as long as the object where combination() is implemented will continue to exist through all the calls to combination(). – David Aug 18 '12 at 20:16
  • If you're not satisfied with David's answer, may I suggest revealing more about the context / motivation / use case for this problem. E.g is the computation "one off" or are you needing to repeat it (repeated sampling)? Do you really just want a particular combination e.g 2987th, and will **never** care about 2977th or 2979th? Do you know a priori how far into the subset of desirable combinations you need to go? Finally, are you working with a single fixed configuration (B bits, K 1's, B-K 0's) and sampling it repeatedly, or is your problem about general configurations? – Assad Ebrahim Aug 18 '12 at 22:17
  • I recommend reading http://realtimecollisiondetection.net/blog/?p=78 which explains generation of such combinations in case of native bit-length with bit hacks – aka.nice Aug 18 '12 at 22:56

3 Answers3

1

I'm not sure I understand the problem, but if you only want the i-th combination without generating the others, here is a possible algorithm:

There are C(M,N)=M!/(N!(M-N)!) combinations of N bits set to 1 having at most highest bit at position M.

You want the i-th: you iteratively increment M until C(M,N)>=i

while( C(M,N) < i ) M = M + 1

That will tell you the highest bit that is set. Of course, you compute the combination iteratively with

C(M+1,N) = C(M,N)*(M+1)/(M+1-N)

Once found, you have a problem of finding (i-C(M-1,N))th combination of N-1 bits, so you can apply a recursion in N...
Here is a possible variant with D=C(M+1,N)-C(M,N), and I=I-1 to make it start at zero

SOL=0
I=I-1
while(N>0)
    M=N
    C=1
    D=1
    while(i>=D)
        i=i-D
        M=M+1
        D=N*C/(M-N)
        C=C+D
    SOL=SOL+(1<<(M-1))
    N=N-1
RETURN SOL

This will require large integer arithmetic if you have that many bits...

aka.nice
  • 9,100
  • 1
  • 28
  • 40
0

If the ordering doesn't matter (it just needs to remain consistent), I think the fastest thing to do would be to have combination(i) return anything you want that has the desired density the first time combination() is called with argument i. Then store that value in a member variable (say, a hashmap that has the value i as key and the combination you returned as its value). The second time combination(i) is called, you just look up i in the hashmap, figure out what you returned before and return it again.

Of course, when you're returning the combination for argument(i), you'll need to make sure it's not something you have returned before for some other argument.

If the number you will ever be asked to return is significantly smaller than the total number of combinations, an easy implementation for the first call to combination(i) would be to make a value of the right length with all 0s, randomly set num_ones of the bits to 1, and then make sure it's not one you've already returned for a different value of i.

David
  • 1,429
  • 8
  • 8
0

Your problem appears to be constrained by the binomial coefficient. In the example you give, the problem can be translated as follows:

there are 6 items that can be chosen 2 at a time. By using the binomial coefficient, the total number of unique combinations can be calculated as N! / (K! (N - K)!, which for the case of K = 2 simplifies to N(N-1)/2. Plugging 6 in for N, we get 15, which is the same number of combinations that you calculated with 6! / 4! / 2! - which appears to be another way to calculate the binomial coefficient that I have never seen before. I have tried other combinations as well and both formulas generate the same number of combinations. So, it looks like your problem can be translated to a binomial coefficient problem.

Given this, it looks like you might be able to take advantage of a class that I wrote to handle common functions for working with the binomial coefficient:

  1. Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.

  2. Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.

  3. Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.

  4. Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.

  5. The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.

  6. There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

It should not be hard to convert this class to the language of your choice.

There may be some limitations since you are using a very large N that could end up creating larger numbers than the program can handle. This is especially true if K can be large as well. Right now, the class is limited to the size of an int. But, it should not be hard to update it to use longs.

Bob Bryan
  • 3,687
  • 1
  • 32
  • 45