6

I need to generate binary numbers with the same quantity of ones (or zeros) in random order.
Does anyone know any efficient algorithm for fixed-length binary numbers? Example for 2 ones and 4 digits (just to be more clear):

1100
1010
1001
0110
0101
0011

UPDATE Random order w/o repetitions is significant. Sequence of binary numbers required, not single permutation.

UNdedss
  • 168
  • 1
  • 8
  • I don't know of a closed formula for this (there might be one), but it wouldn't be too much work to take all permutations of 2 one bits in a fixed width number, filling the other slots with zero. – Tim Biegeleisen Jun 15 '17 at 13:22
  • You can take an array of `2's `powers till no of digits-1. Here it would be `arr[] = {1,2,4,8}`. Now find its all combinations in `two nested` loop (i,j) such that `i!=j` and these numbers would be `arr[i]+arr[j]` such that `i!=j`. If there are given more no. of ones, We can convert it to recursive function. – Sanket Makani Jun 15 '17 at 13:26
  • @sanket-makani random order is significant – UNdedss Jun 15 '17 at 13:39
  • Possible duplicate of [Efficient random permutation of n-set-bits](https://stackoverflow.com/questions/17010857/efficient-random-permutation-of-n-set-bits) – Paul R Jun 15 '17 at 13:56
  • @PaulR isn't that just to generate *one* of them? it wouldn't be easy to list them all using that – harold Jun 15 '17 at 14:05
  • @harold yes, Paus's link describes generation of a single random permutation, whether I need sequence. BTW your answer and links look comprehensive, but it will require time for analysis – UNdedss Jun 15 '17 at 14:18
  • My bad - duplicate close vote retracted - I'm pretty sure there is at least one duplicate out there somewhere, though, but it's hard to find... – Paul R Jun 15 '17 at 15:32

5 Answers5

7

If you have enough memory to store all the possible bit sequences, and you don't mind generating them all before you have the first result, then the solution would be to use some efficient generator to produce all possible sequences into a vector and then shuffle the vector using the Fisher-Yates shuffle. That's easy and unbiased (as long as you use a good random number generator to do the shuffle) but it can use a lot of memory if n is large, particularly if you are not sure you will need to complete the iteration.

But there are a couple of solutions which do not require keeping all the possible words in memory. (C implementations of the two solutions follow the text.)

1. Bit shuffle an enumeration

The fastest one (I think) is to first generate a random shuffle of bit values, and then iterate over the possible words one at a time applying the shuffle to the bits of each value. In order to avoid the complication of shuffling actual bits, the words can be generated in a Gray code order in which only two bit positions are changed from one word to the next. (This is also known as a "revolving-door" iteration because as each new 1 is added, some other 1 must be removed.) This allows the bit mask to be updated rapidly, but it means that successive entries are highly correlated, which may be unsuitable for some purposes. Also, for small values of n the number of possible bit shuffles is very limited, so there will not be a lot of different sequences produced. (For example, for the case where n is 4 and k is 2, there are 6 possible words which could be sequenced in 6! (720) different ways, but there are only 4! (24) bit-shuffles. This could be ameliorated slightly by starting the iteration at a random position in the sequence.)

It is always possible to find a Gray code. Here's an example for n=6, k=3: (The bold bits are swapped at each step. I wanted to underline them but for some inexplicable reason SO allows strikethrough but not underline.)

111000   010110   100011   010101
101100   001110   010011   001101
011100   101010   001011   101001
110100   011010   000111   011001
100110   110010   100101   110001

This sequence can be produced by a recursive algorithm similar to that suggested by @JasonBoubin -- the only difference is that the second half of each recursion needs to be produced in reverse order -- but it's convenient to use a non-recursive version of the algorithm. The one in the sample code below comes from Frank Ruskey's unpublished manuscript on Combinatorial Generation (Algorithm 5.7 on page 130). I modified it to use 0-based indexing, as well as adding the code to keep track of the binary representations.

2. Randomly generate an integer sequence and convert it to combinations

The "more" random but somewhat slower solution is to produce a shuffled list of enumeration indices (which are sequential integers in [0, n choose k)) and then find the word corresponding to each index.

The simplest pseudo-random way to produce a shuffled list of integers in a contiguous range is to use a randomly-chosen Linear Congruential Generator (LCG). An LCG is the recursive sequence xi = (a * xi-1 + c) mod m. If m is a power of 2, a mod 4 is 1 and c mod 2 is 1, then that recursion will cycle through all 2m possible values. To cycle through the range [0, n choose k), we simply select m to be the next larger power of 2, and then skip any values which are not in the desired range. (That will be fewer than half the values produced, for obvious reasons.)

To convert the enumeration index into an actual word, we perform a binomial decomposition of the index based on the fact that the set of n choose k words consists of n-1 choose k words starting with a 0 and n-1 choose k-1 words starting with a 1. So to produce the ith word:

  • if i < n-1 choose k we output a 0 and then the ith word in the set of n-1 bit words with k bits set;
  • otherwise, we output a 1 and then subtract n-1 choose k from i as the index into the set of n-1 bit words with k-1 bits set.

It's convenient to precompute all the useful binomial coefficients.

LCGs suffer from the disadvantage that they are quite easy to predict after the first few terms are seen. Also, some of the randomly-selected values of a and c will produce index sequences where successive indices are highly correlated. (Also, the low-order bits are always quite non-random.) Some of these problems could be slightly ameliorated by also applying a random bit-shuffle to the final result. This is not illustrated in the code below but it would slow things down very little and it should be obvious how to do it. (It basically consists of replacing 1UL<<n with a table lookup into the shuffled bits).

The C code below uses some optimizations which make it a bit challenging to read. The binomial coefficients are stored in a lower-diagonal array:

  row
index
 [ 0]   1
 [ 1]   1 1
 [ 3]   1 2 1
 [ 6]   1 3 3 1
 [10]   1 4 6 4 1

As can be seen, the array index for binom(n, k) is n(n+1)/2 + k, and if we have that index, we can find binom(n-1, k) by simply subtracting n, and binom(n-1, k-1) by subtracting n+1. In order to avoid needing to store zeros in the array, we make sure that we never look up a binomial coefficient where k is negative or greater than n. In particular, if we have arrived at a point in the recursion where k == n or k == 0, we can definitely know that the index to look up is 0, because there is only one possible word. Furthermore, index 0 in the set of words with some n and k will consist precisely of n-k zeros followed by k ones, which is the n-bit binary representation of 2k-1. By short-cutting the algorithm when the index reaches 0, we can avoid having to worry about the cases where one of binom(n-1, k) or binom(n-1, k-1) is not a valid index.

C code for the two solutions

Gray code with shuffled bits

void gray_combs(int n, int k) {
  /* bit[i] is the ith shuffled bit */
  uint32_t bit[n+1];
  {
    uint32_t mask = 1;
    for (int i = 0; i < n; ++i, mask <<= 1)
      bit[i] = mask;
    bit[n] = 0;
    shuffle(bit, n);
  }

  /* comb[i] for 0 <= i < k is the index of the ith bit
   * in the current combination. comb[k] is a sentinel. */
  int comb[k + 1];
  for (int i = 0; i < k; ++i) comb[i] = i;
  comb[k] = n;

  /* Initial word has the first k (shuffled) bits set */
  uint32_t word = 0;
  for (int i = 0; i < k; ++i) word |= bit[i];

  /* Now iterate over all combinations */
  int j = k - 1; /* See Ruskey for meaning of j */
  do {
    handle(word, n);
    if (j < 0) {
      word ^= bit[comb[0]] | bit[comb[0] - 1];
      if (--comb[0] == 0) j += 2;
    }
    else if (comb[j + 1] == comb[j] + 1) {
      word ^= bit[comb[j + 1]] | bit[j];
      comb[j + 1] = comb[j]; comb[j] = j;
      if (comb[j + 1] == comb[j] + 1) j += 2;
    }
    else if (j > 0) {
      word ^= bit[comb[j - 1]] | bit[comb[j] + 1];
      comb[j - 1] = comb[j]; ++comb[j];
      j -= 2;
    }
    else {
      word ^= bit[comb[j]] | bit[comb[j] + 1];
      ++comb[j];
    }
  } while (comb[k] == n);
}

LCG with enumeration index to word conversion

static const uint32_t* binom(unsigned n, unsigned k) {
  static const uint32_t b[] = {
    1,
    1, 1,
    1, 2, 1,
    1, 3, 3, 1,
    1, 4, 6, 4, 1,
    1, 5, 10, 10, 5, 1,
    1, 6, 15, 20, 15, 6, 1,
    // ... elided for space
  };
  return &b[n * (n + 1) / 2 + k];
}

static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) {
  uint32_t rv = 0;
  while (r) {
    do {
      b -= n;
      --n;
    } while (r < *b);
    r -= *b;
    --b;
    --k;
    rv |= 1UL << n;
  }
  return rv + (1UL << k) - 1;
}

static bool lcg_combs(unsigned n, unsigned k) {
  const uint32_t* b = binom(n, k);
  uint32_t count = *b;
  uint32_t m = 1; while (m < count) m <<= 1;
  uint32_t a = 4 * randrange(1, m / 4) + 1;
  uint32_t c = 2 * randrange(0, m / 2) + 1;
  uint32_t x = randrange(0, m);
  while (count--) {
    do
      x = (a * x + c) & (m - 1);
    while (x >= *b);
    handle(enumerate(b, x, n, k), n);
  }
  return true;
}

Note: I didn't include the implementation of randrange or shuffle; code is readily available. randrange(low, lim) produces a random integer in the range [low, lim); shuffle(vec, n) randomly shuffles the integer vector vecof length n.

Also, the the loop calls handle(word, n) for each generated word. That must must be replaced with whatever is to be done with each combination.

With handle defined as a function which does nothing, gray_combs took 150 milliseconds on my laptop to find all 40,116,600 28-bit words with 14 bits set. lcg_combs took 5.5 seconds.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    If you always generate the sequence in the same way, and then apply a random bit shuffle to all permutations, doesn't that make the resulting sequence more predictable than if you generated the sequence and then shuffled it randomly? – m69's been on strike for years Jun 16 '17 at 01:55
  • 1
    @m69: Yes, absolutely. (Although the shuffle could also be quite predictable as well if you don't use a cryptographic RNG). More seriously, successive entries are highly correlated (since it is still a Gray sequence, even with the bits shuffled). I avoided the "generate all possibilities and shuffle them" because it could unnecessarily use a lot of memory. I'll edit the answer to include a note and some other solutions. – rici Jun 16 '17 at 03:06
  • @rici Did u test your code snippet from option 1? I catch IndexOutOfRange exception in branch `else if (comb[j + 1] == comb[j] + 1)` after all combinations iterated – UNdedss Jun 17 '17 at 11:48
  • @UNdedss: Yes, I tested it (but not in a language which has IndexOutOfRange exceptions). I also redid the copy and paste from my working copy and checked to make sure there are no significant differences (I meant to change the update to `word` before, so it was worth doing anyway.) The only difference is that my version ensures that `0 < n <= 32` and `0 <= k <= n`; if `k` were greater than `n`, I would expect an out-of-range index error but otherwise I don't see it. – rici Jun 17 '17 at 14:57
  • @rici You was reading memory outside of the array at last iteration. It wasn't used and was unchanged, so it is safe. I just added simple check and everything works fine. But your code has another flaw - once selected bits are used again and again to generate new combinations. This will result in lost bits for some long-lasting and unfinished iterations. Ideal for production solution should use balanced Gray code. Or some king of injected generator for gray code. Can I contact you for more substantial conversation? – UNdedss Jun 19 '17 at 08:34
  • @UNdedss: You're right, there was an out-of-range access of `bit[n]` on the last iteration. I fixed it. (You could have been clearer in your problem report, I was mislead by the line you indicated as erroneous.) The function produces the next combination before it checks that the loop is done, so the last combination is never used. Rather than add a test to the loop, I just made the vector of bitmasks longer... – rici Jun 19 '17 at 14:27
  • @UNdedss: I'm not sure what you mean by "balanced Gray code" but if your criticism is that there are bits which remain unchanged for a long time, including one bit whose value is unchanged until half-way through the iteration, you are quite correct. Perhaps there is a better Gray code iteration; I'll give that some thought (but if you have a reference handy, let me know.) – rici Jun 19 '17 at 14:33
  • @rici I found very understandable description of balanced Gray code in the article [link](https://sciyoshi.com/static/2010/12/gray-codes/report.pdf). In a few days I'll try to find its implementations. – UNdedss Jun 20 '17 at 09:40
  • @UNdedss: Yes, I found a number of references to that particular construction but the details of solving the "delicate part of this construction" are not so easy to find. In any event, that applies to the generation of all binary values, not n/k combinations. Now, it's well-known that you can construct the BRGC for combinations by filtering the BRGC; maybe that would work for balanced codes as well. But that's quite an expensive algorithm since generating all binary strings is exponential. – rici Jun 20 '17 at 15:32
  • @UNdedss: You might find this thesis useful: https://repository.tudelft.nl/islandora/object/uuid:975a4a47-7935-4f76-9503-6d4e36b674a3?collection=research . Let me know how you get on with it. – rici Jun 20 '17 at 15:49
3

Integers with exactly k bits set are easy to generate in order.

You can do that, and then change the order by applying a bit-permutation to the results (see below), for example here's a randomly generated 16-bit (you should pick one with the right number of bits, based on the word size not on the number of set bits) bit-permutation (not tested):

uint permute(uint x) {
  x = bit_permute_step(x, 0x00005110, 1);  // Butterfly, stage 0
  x = bit_permute_step(x, 0x00000709, 4);  // Butterfly, stage 2
  x = bit_permute_step(x, 0x000000a1, 8);  // Butterfly, stage 3
  x = bit_permute_step(x, 0x00005404, 1);  // Butterfly, stage 0
  x = bit_permute_step(x, 0x00000231, 2);  // Butterfly, stage 1
  return x;
}

uint bit_permute_step(uint x, uint m, int shift) {
  uint t;
  t = ((x >> shift) ^ x) & m;
  x = (x ^ t) ^ (t << shift);
  return x;
}

Generating the re-ordered sequence is easy:

uint i = (1u << k) - 1;
uint max = i << (wordsize - k);
do
{
    yield permute(i);
    i = nextPermutation(i);
} while (i != max);
yield permute(i); // for max

Where nextPermutation comes from the linked question,

uint nextPermutation(uint v) {
  uint t = (v | (v - 1)) + 1;
  uint w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
  return w;
}

The bit-permutation should be chosen as a random permutation (eg take 0..(wordsize-1) and shuffle) and then converted to bfly masks (I used programming.sirrida.de/calcperm.php), not as randomly generated bfly masks.

harold
  • 61,398
  • 6
  • 86
  • 164
  • FWIW, a couple of years ago I re-examined the bit-hacking nextPermutation function you quote in light of my theory that division should be avoided even if it requires looping. You can see the results in [this answer](https://stackoverflow.com/a/30951360/1566221) which suggests that looping over a shift is quite a bit faster on average. – rici Jun 17 '17 at 00:20
  • @rici not terribly surprising I suppose, division is disgusting. At first I wanted to quote the tzcnt-based one, but then I thought it would be dangerous because I often receive a couple of drive by downvotes for non-portable things. – harold Jun 17 '17 at 00:24
  • yeah, that one might be even faster on appropriate hardware. But the shift loop is pretty fast. Division turns out to be faster only when there are a *lot* of shifts, which means large `n` and small `k`. But when `k` is very small, the total number of combinations is also small so it doesn't really matter which is faster. – rici Jun 17 '17 at 00:27
2

You can modify the general permutation algorithm to work with binary. Here's an implementation in C++:

#include<iostream>
#include<string>
#include<iostream>
void binaryPermutation(int ones, int digits, std::string current){
        if(digits <= 0 && ones <= 0){
                std::cout<<current<<std::endl;
        }
        else if(digits > 0){
                if(ones > 0){
                        binaryPermutation(ones-1, digits-1, current+"1");
                }
                binaryPermutation(ones, digits-1, current+"0");
        }
}
int main()
{
        binaryPermutation(2, 4, "");
        return 0;
}

This code outputs the following: 1100 1010 1001 0110 0101 0011

You can modify it to store these outputs in a collection or do something other than simply print them.

Paul R
  • 208,748
  • 37
  • 389
  • 560
Jayson Boubin
  • 1,504
  • 2
  • 11
  • 19
2

How about this solution in Python which does permutations?

from itertools import permutations

fixed_length = 4
perms = [''.join(p) for p in permutations('11' + '0' * (fixed_length - 2))]
unique_perms = set(perms)

This would return the numbers as strings, easily convertible with int(num, 2).

As for efficiency, running this took 0.021 milliseconds on my machine.

  • 1
    I think you'd do better to use `itertools.combinations` to generate the _positions_ of the set bits. That way you avoid the repetitions and associated inefficiencies. – Mark Dickinson Jun 15 '17 at 20:03
  • Eh, yeah. But a simple question with no effort shown got a similar answer...it does what was requested. It even has a report of runtime. :) –  Jun 16 '17 at 02:35
2

I think you can use Heap's algorithm. This algorithm generates all possible permutations of n objects. Just create simple array and use algorithm for generating all possible permutations.

This algorithm is non effective if you want to iterate over binary numbers with BINARY operations. For binary operations you can use LFSR.

LFSR is a simple method for iteration over all numbers. I think you can do some simple modifications for generations fixed size zeros numbers with LFSR.

Gor
  • 2,808
  • 6
  • 25
  • 46