Efficiently sampling a random 0-1 sequence of length l with at least k ones

Question

Is there a way to efficiently sample a random 0-1 sequence of length l with at least k ones? When k << l/2, doing reject sampling would be good enough. But when k is relatively large it is trickier.

please check https://stackoverflow.com/questions/61393463/is-there-an-efficient-way-to-generate-n-random-integers-in-a-range-that-have-a-g — leaf_yakitori, Aug 25 '21 at 06:49
Does this answer your question? [Is there an efficient way to generate N random integers in a range that have a given sum or average?](https://stackoverflow.com/questions/61393463/is-there-an-efficient-way-to-generate-n-random-integers-in-a-range-that-have-a-g) — Peter O., Aug 26 '21 at 00:19

score 1 · Accepted Answer · answered Sep 04 '21 at 13:47

If you prefer a rejection-free approach, there is always the possibility to:

precompute the total number t of possible objects, using some combinatorics
draw a random integer number (the “rank”) between 0 and t-1 inclusive
compute the object from its rank, again using some combinatorics

The process of getting the object from its rank is known as unranking.

For example, consider bit strings that are 8 bits long, so n=8, with k at least 5. So k lies in [5, 6, 7, 8].

To compute t, we need to sum the number of combinations of k items out of n=8 items. Chosen items (bits) are set to 1, and the other ones are set to 0.

In order to do this, we have to use binomial coefficients C(n,k) as found in the Pascal Triangle.

Say we use the Python programming language for that. We need some Python code to compute a binomial coefficient:

import  random
import  functools   as  fn
import  operator    as  op

# binomial coefficient C(n,k):
def cnk(n,k):
    p1 = fn.reduce(op.mul, range(n-k+1,n+1), 1)
    p2 = fn.reduce(op.mul, range(1,k+1),     1)
    return  (p1 // p2)

Next, let's compute t within the Python interpreter:

$ python3
Python 3.9.6 (default, Jul 16 2021, 00:00:00) 
>>> 
>>> ks = list(range(5,8+1))
>>> ks
[5, 6, 7, 8]
>>> 
>>> cnk(8,5)
56
>>> 
>>> tas = list(map(lambda k: cnk(8,k), ks))
>>> tas
[56, 28, 8, 1]
>>> 
>>> t = sum(tas)
>>> t
93
>>>

So t is equal to 56+28+8+1=93, and we have 93 possible bit strings overall.

Let's now draw some random rank, between 0 and 92 inclusive:

>>> 
>>> import random
>>> rank = random.randint(0,92)
>>> 
>>> rank
77
>>>

So our drawn rank is 77. Let's describe the unranking process with this value.

The 56 bit strings with k=5 have ranks between 0 and 55, and the 28 bit strings with k=6 have ranks between 56 and 56+28-1=83. So 77 lies within the k=6 range. Specifically, it is the object of combinatorial rank 77-56=21 within n=8 and k=6.

Say we call that last number cbRank=21. Fortunately, the process of unranking a combination of k among n items is well studied; see for example this paper by Genitrini and Pépin.

From the paper, we can easily write this Python function:

# Makes a list of the combination members from the combination's rank:
def combiUnrank(k, rank):  
    k1    = k
    rank1 = rank
    res   = []
    while (rank1 > 0):
        m1 = k1
        while (cnk(m1, k1) <= rank1):
            m1 = m1 + 1
        m1 = m1 -1
        rank1 = rank1 - cnk(m1,k1)
        k1    = k1 - 1
        res.append(m1)
    lRes   =   len(res)
    res    +=  list(range(k - 1 - lRes, -1, -1))  # pad with lowest values
    return res

And then we can use it to see which combination has a rank of 21:

>>> 
>>> cbRank=21
>>> k=6
>>> n=8
>>> 
>>> combiUnrank(k, cbRank)
[7, 6, 5, 3, 2, 1]
>>>

So we've finally got our random 8 bits long bit string, where bits 7, 6, 5, 3, 2, 1 are set to 1 and the remaining ones, 0 and 4, are set to 0. That is, “01110111”. But this last process can also be automated:

# Makes a bit list from a combination:
def getBitListFromCombi(n, cms):
    bits = n * [0]        # get an all-zeroes bit list
    for cm in cms:                 
        bits[cm] = 1      # set the appropriate bits to 1
    return bits

Testing:

>>> 
>>> n=8
>>> k=6
>>> cbRank=21
>>> 
>>> cs = combiUnrank(k, cbRank)
>>> cs
[7, 6, 5, 3, 2, 1]
>>> 
>>> bits = getBitListFromCombi(n,cs)
>>> bits
[0, 1, 1, 1, 0, 1, 1, 1]
>>>

And by construction of the algorithm, all legal bit strings have an identical probablity 1/t of being drawn.

Efficiently sampling a random 0-1 sequence of length l with at least k ones

1 Answers1