uniformly distributed unbiased 4bit parsimonious range mapping from a bit limited TRNG

Question

I am trying to implement a range mapper for TRNG output files for a C application with ranges of up to 4 bits in size. Due to the pigeonhole bias problem I have settled on using a discard algorithm.

My idea for a parsimonious algorithm would be something like:

-- Read 16 bytes from file and store as an indexed 128 bit unsigned integer bitbucket to be bitmask selected n bits at a time.
-- Predetermine as much as possible the ranges/buckets required for each input and store in an array.
-- For each n bits in the bitbucket select an input from the array that will not discard it if one exists. If 2 bits cannot find an input try 3 bits and if that cannot find an input try with 4 bits. At first when there are many inputs it should be easy not to discard, but as the choice of inputs gets low discards will become more common. I am not entirely sure if it is better to start with fewer bits and work my way up or to do the opposite.

The downside of this bit sipping range mapper seems to be that I need to assume about twice as much random input data as would be required with biased scaling methods. For instance a 9 bucket input from a 4 bit rand output will miss about 43% of the time.

Existing implementations/algorithms: This seems like an example of a more complex and efficient method of parsimonious range mapping but I find his explanation entirely impenetrable. Can anyone explain it to me in English or suggest a book I might read or a university class I might take that would give me a background to understand it?

There is also arc4random which seems to be a runtime optimized unbiased modulo discard implementation. Like almost all unbiased range mapper implementations I have found this seems not to particularly care about how much data it uses. That does not however mean that it is necessarily less data efficient because it has the advantage of fewer misses.

The basic idea of arc4random seems to be that as long as the number of pigeons (max_randvalue_output) is evenly divisible by the number of holes (rangeupperbound) the modulo function itself is an elegant and unbiased range mapper. However modulo only seems to be relevant when you are not bit sipping, i.e. when the output from the random source is more than ceil(log2(buckets)) bits.

There seems to be a tradeoff between the number of 'wasted' random bits and the percentage of discards. The percentage of misses is inversely proportional to the number of excess bits in the input to the range mapper. It seems like there should be a mathematical way to compare the data efficiency of a bit sipping range mapper with a more bit hungry version with fewer misses, but I don't know it.

So my plan is to just write two implementations: a bit sipping parsimonious type of range mapper that may or may not be a little like the mathforum example (which I don't understand) and an invariant byte input modulo range mapper which accepts byte inputs from a TRNG and uses a discard-from-the-top-of-largest-multiple modulo method of debiasing to match (x)n pigeons to n holes which is intended to be like arc4random. When finished I plan to post them on codereview.

I am basically looking for help or advice with any of these issues that might help me to write a more parsimonious but still unbiased range mapper particularly with respect to my parsimonious algorithm. Runtime efficiency is not a priority.

score 2 · Answer 1 · answered Mar 22 '20 at 10:03

2

There is a far simpler approach to generating random numbers in a range from a random bit stream, which is not only optimally efficient, but also exact. It's called the "Fast Dice Roller" method of J. Lumbroso:

"Optimal Discrete Uniform Generation from Coin Flips, and Applications", 2013.

See also this question.

answered Mar 22 '20 at 10:03

Peter O.

32,158
14
82
96

I have been looking at https://peteroupc.github.io/randomfunc.html#Uniform_Random_Integers for either pseudocode or python code of an FDR implementation, but I cannot figure out if you have one. Is there an FDR implementation somewhere there? I realize that there is a c implementation in the actual paper in Appendix A, but I find your code easier to understand than Lumbroso's since you use meaningful variable names. There are a lot of Def statements in randomgen.zip but if there is an FDR implementation there I cannot find it. – iamoumuamua Mar 24 '20 at 02:20
1

It's in the pseudocode method RndIntHelperPowerOfTwo, which is part of the RNDINT pseudocode. – Peter O. Mar 24 '20 at 02:56

score 2 · Accepted Answer · edited Dec 12 '22 at 18:06

I looked at the "Fast Dice Roller" (FDR) pointed to by @Peter.O, which is indeed simple (and avoids dividing). But each time a random number is generated, this will eat some number of bits and discard the fraction of those bits it does not use.

The "batching"/"pooling" techniques seem to do better than FDR, because the unused fractions of bits are (at least partly) retained.

But interestingly, the DrMath thing you referenced is basically the same as the FDR, but does not start from scratch for each random value it returns.

So the FDR to return 0..n-1 goes:

  random(n):
    m = 1 ; r = 0 
    while 1:
        # Have r random and evenly distributed in 0..m-1
        # Need m >= n -- can double m and double r adding random bit until
        #                we get that.  r remains evenly distributed in 0..m-1 
        while m < n: r = 2*r + next_bit() ; m = m*2
        # Now have r < m and n <= m < n*2
        if r < n: return r   # Hurrah !
        # Have overshot, so reduce m and r to m MOD n and r MOD m
        m -= n ; r -= n ;

The DrMath thing goes:

  # Initialisation once before first call of random(m)
  ms = 1 ; rs = 0
  N = ... # N >= maximum n and N*2 does not overflow 

  # The function -- using the "static"/"global" ms, rs and N 
  random(n):
    m = ms ; r = rs
    while 1:
        # Same as FDR -- except work up to N not n
        while m < N: r = 2*r + next_bit() ; m = m*2 ;
        # Now have r < m and m >= N
        # Set nq = largest multiple of n <= m
        # In FDR, at this point q = 1 and nq = n
        q  = m DIV n ;
        nq = n * q
        if r < nq:             # all set if r < nq
            # in FDR ms = 1, rs = 0 
            ms = q             # keep stuff not used this time
            rs = r DIV n       # ditto
            return r MOD n     # hurrah !
        # Overshot, so reduce MOD n*q -- remembering, for FDR q == 1
        m = m - nq 
        r = r - nq

which, as noted, is basically the same as FDR, but keeps track of the unused randomness.

When testing I found:

  FDR:    for 100000 values range=3 used 266804 bits cost=1.6833
  DrMath: for 100000 values range=3 used 158526 bits cost=1.0002

Where the cost is bits-used / (100000 * log2(3)) noting that log2(3) = (1.58496). (So the cost is the number of bits used divided by the number of bits one would hope to use).

Also:

  FDR:    for 100000 values range=17: 576579 bits cost=1.4106
  DrMath: for 100000 values range=17: 408774 bits cost=1.0001

And:

  FDR:    for 100000 values ranges=5..60: 578397 bits cost=1.2102
  DrMath: for 100000 values ranges=5..60: 477953 bits cost=1.0001

where constructed 100000 values, and for each one chose a range in 5..60 (inclusive).

It seems to me that DrMath has it ! Though for larger ranges it has less of an advantage.

Mind you... DrMath uses at least 2 divisions per random value returned, which gives me conniptions run-time-wise. But you did say you weren't interested in run-time efficiency.

How does it work ?

So, we want a sequence of random values r to be uniformly distributed in a range 0..n-1. Inconveniently, we only have a source of randomness which gives us random values which are uniformly distributed in 0..m-1. Typically m will be a power of 2 -- and let us assume that n < m (if n == m the problem is trivial, if n > m the problem is impossible). For any r we can take r MOD n to give a random value in the required range. If we only use r when r < n then (trivially) we have the uniform distribution we want. If we only use r when r < (n * q) and (n * q) < m we also have a uniform distribution. We are here "rejecting" r which are "too big". The fewer r we reject, the better. So we should choose q such that (n * q) <= m < (n * (q-1)) -- so n * q is the largest multiple of n less than or equal to m. This, in turn, tells us that n "much less" than m is to be preferred.

When we "reject" a given r we can throw it all away, but that turns out not to be completely necessary. Also, m does not have to be a power of 2. But we will get to that later.

Here is some working Python:

M = 1
R = 0
N = (2**63)    # N >= maximum range

REJECT_COUNT = 0

def random_drmath(n):
    global M, R, REJECT_COUNT

    # (1) load m and r "pool"
    m = M
    r = R
    while 1:
        # (2) want N <= m < N*2
        #     have 0 <= r < m, and that remains true.
        #     also r uniformly distributed in 0..m-1, and that remains true
        while m < N:
            r = 2*r + next_bit()
            m = m*2
            
        # (3) need r < nq where nq = largest multiple of n <= m
        q  = m // n
        nq = n * q
        if r < nq:
            # (4) update the m and r "pool" and return 0..n-1 
            M = q
            R = r // n
            return r % n       # hurrah !

        # (5) reject: so reduce both m and r by MOD n*q
        m = m - nq 
        r = r - nq
        REJECT_COUNT += 1

Must have N >= maximum range, preferably much bigger. 2**31 or 2**63 are obvious choices.

On the first call of random_drmath() step (2) will read random bits to "fill the pool". For N = 2**63, will end up with m = 2**63 and r with 63 random bits. Clearly r is random and uniformly distributed in 0..m-1. [So far, so good.]

Now (and on all further calls of random_drmath()) we hope to extract a random value uniformly in 0..n-1 from r, as discussed above. So -- step (3) -- constructs nq which is the largest multiple of n which is less than or equal to m. If r >= nq we cannot use it, because there are fewer than n values in nq..m-1 -- this is the usual "reject" criterion.

So, where r < nq can return a value -- step (4). The trick here is to think of m and r as numbers "base-n". The ls "digit" of r is extracted (r % n) and returned. Then m and r are shifted right by one "digit" (q = m // n and r // n), and stored in the "pool". I think that it is clear that at this point r and m are still r < m and r random and uniformly distributed in 0..m-1. But m is no longer a power of 2 -- but that's OK.

But, if r >= nq must reduce r and m together -- step (5) -- and try again. Trivially, could set m = 1 ; r = 0 and start again. But what we do is subtract nq from both m and r That leaves r uniformly distributed in 0..m-1. This last step feels like magic, but we know that r is in nq..m-1 and each possible value has equal probability, so r-nq is in the range 0..m-nq-1 and each possible value still has equal probability ! [Remember that the 'invariant' at the top of the while loop is that r is random and uniformly distributed in 0..m-1.]

For small n the rejection step will discard most of r, but for small n (compared to N) we hope not to reject very often. Conversely, for large n (compared to N) we may expect to reject more often, but this retains at least some of the random bits we have eaten so far. I feel there might be a way to retain more of r... but a haven't thought of a simple way to do that... and if the cost of reading one random bit is high, it might be worth trying to find a not-simple way !

FWIW: setting N = 128 I get:

  FDR:    for 100000 values ranges=3.. 15: 389026 bits cost=1.2881
  DrMath: for 100000 values ranges=3.. 15: 315815 bits cost=1.0457
  
  FDR:    for 100000 values ranges 3.. 31: 476428 bits cost=1.2371
  DrMath: for 100000 values ranges 3.. 31: 410195 bits cost=1.0651
  
  FDR:    for 100000 values ranges 3.. 63: 568687 bits cost=1.2003
  DrMath: for 100000 values ranges 3.. 63: 517674 bits cost=1.0927
  
  FDR:    for 100000 values ranges 3..127: 664333 bits cost=1.1727
  DrMath: for 100000 values ranges 3..127: 639269 bits cost=1.1284

so as n approaches N the cost per value goes up.

My (brief) testing suggests that for range _n_, DrMath uses very nearly _log2(n)_ bits for each random number -- for a single repeated range, and for a random sequence of ranges, as shown. It's hard to see how to do better than that. The "pooling" approach may give similar results, but it is more complicated (and I haven't tried to test it). The DrMath approach keeps one "pool" (the `ms` and `rs` in the pseudo-code) no matter how many different ranges you need -- and there is no "padding" to work out and deal with. — Chris Hall, Mar 23 '20 at 11:04
@iamoumuamua I did some testing as well, and didn't find any obvious flaws in the DrMath method. With a large value of N (e.g. 512*1024*1024), it consumes about `log2(N)` bits to get started, but rarely discards bits after that. So I agree with Chris that DrMath is the correct answer. — user3386109, Mar 23 '20 at 21:24
@iamoumuamua The DrMath method is new to me, and I don't fully understand how it works. I was doing adversarial testing (basically trying to prove that it doesn't work). I asked three questions. Q1) When a single request is made, are the possible outcomes perfectly uniform? A1) Yes. Q2) When two requests are made, is the second request fully independent? A2) Yes, the first request seems to have no influence on the outcome of the second request. — user3386109, Mar 24 '20 at 00:10
Q3) Is the method as efficient as claimed? A3) Yup, seems to be. In fact, what I learned is that there's an initial cost (about log2(N) bits consumed) to get the algorithm started, but after that there are very few discarded bits. So my experimental results indicate that the method works, even if I don't understand it. — user3386109, Mar 24 '20 at 00:10
@iamoumuamua: I have added a discussion of how the DrMath thing works. HTH. Will try to complete later. — Chris Hall, Mar 24 '20 at 13:08
For the bit efficiency I simply (a) count the number of bits used via `next_bit()` and (b) count the number of random values created for each range used. If the random number generator was 100% efficient, then it would use `bu` bits, where `bu = SIGMA(rc[i] * log2(i))` for all ranges `i` -- `rc[i]` being the number of random numbers generated for range `i`. You cannot really count discarded bits, per se. The `REJECT_COUNT` in the python is only counting the number of times it has to reject anything at all. — Chris Hall, Mar 27 '20 at 13:22
I don't think `log2(R)` is useful. `R` may be any number in `0..M-1` with equal probability. `log2(M)` is pretty much the "amount of randomness in hand" before and after each call. I don't think you can measure `lostbits` as an integer updated on every call. Each call starts with `log2(M)` bits in hand, calls `next_bit()` _nb_ times, returns `log2(n)` bits and ends with `log2(M)` bits in hand -- only _nb_ is an integer. For repeated `n=11` you'd expect _3 <= nb <= 4_, since `log2(11)` = 3.459. For variable `n`, _nb_ depends on the _previous_ `n` (and the current `log2(M)` and `log2(N)`). — Chris Hall, Mar 28 '20 at 12:45
Yes, the randomness is saved in `R`, which may be _any value_ in `0..M-1` -- it may be 0! The value of `M` is the _amount of randomness_. `M` is constrained `N <= M < N*2`, so yes, the larger `N` is the better. So the "residual random data" is in `R` _and_ `M` _together_. The initial state is `M=1, R=0`, so the _first_ `m` will be some _2^nb_ and the first `r` will be an `nb-bit` random number. After the first `random_drmath(n)`, `M` is *not* a power of 2 but `R` is a `log2(M)`-bit random value. [Try, say, `N=128` and ask for `n=11` twenty times or so, see what it does.] — Chris Hall, Mar 28 '20 at 16:24
You have to shake your head and get rid of the idea that the _value_ of `R` says (much) about the amount of randomness it represents. If `R = 0`, it represents a _1 in 16_ event if `R` may be in _0..15_, but a _1 in 1000_ event if `R` may be in _0..999_. Consider also `R=314159`, but first if it's a value drawn from _0..999999_ and then if it's from _0..10^20_. So, yes, when Dr Jacques talks of _r=21_ and _m=38_ that is retaining _log2(38) = ~5.24_ bits, because _r_ could be _any_ value in _0..m-1_. Similarly, for _r=3_ and _m=6_, that's _log2(6)= ~2.59_ bits. — Chris Hall, Mar 28 '20 at 17:12
So he was using log2(m) to determine the saved bits or the number of calls to next_bit() we don't have to make later. So then can't I also use that to compare with the total number of calls to next_bit() and maybe log2(n)? If I know the total number of bits snatched from the random buffer and the number of bits needed to represent n and the number of bits saved for later then I don't understand why I cannot use lostbits += nextbitcount - (log2(n) + log2(M)) if log2(m) really does represent the number of saved bits. — iamoumuamua, Mar 28 '20 at 17:31
There aren't any `lostbits` until a reject event, but then I guess that `lostbits += log2(m) - log2(m - nq)` (just before the `m = m - nq`). At any moment you have used `nextbitcount` (total) bits and you have `log2(M)` bits "in hand". So if `genbitcount` is the sum of `log2(n)` for all calls, then I think `lostbits = nextbitcount - log2(M) - genbitcount`, to date. My 'cost' metric `nextbitcount / genbitcount` assumes no further numbers are generated. At any moment `(nextbitcount - log2(M)) / genbitcount` is the 'cost' to date, but part of the current `log2(M)` may be lost later. — Chris Hall, Mar 28 '20 at 18:21

uniformly distributed unbiased 4bit parsimonious range mapping from a bit limited TRNG

2 Answers2

How does it work ?