6

Given a random source (a generator of random bit stream), how do I generate a uniformly distributed random floating-point value in a given range?

Assume that my random source looks something like:

unsigned int GetRandomBits(char* pBuf, int nLen);

And I want to implement

double GetRandomVal(double fMin, double fMax);

Notes:

  • I don't want the result precision to be limited (for example only 5 digits).
  • Strict uniform distribution is a must
  • I'm not asking for a reference to an existing library. I want to know how to implement it from scratch.
  • For pseudo-code / code, C++ would be most appreciated
Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
  • 3
    Why reinvent the wheel? There are good (pseudo) random sources for all platforms. – the JinX Feb 16 '11 at 10:21
  • Sounds quite similar to an interview question I had for a job at Google! – Nick Feb 16 '11 at 10:27
  • Have you considered the possibility that there aren't exactly 2^n possible values in the provided range? Just how uniform is uniform? :) – Karl Knechtel Feb 16 '11 at 10:48
  • 2
    @the JinX: pseudo random and actually random are vasty different beasts... consider making one-time encryption pads - you wouldn't do that with rand() ;-P – Tony Delroy Feb 16 '11 at 11:08
  • Very true, but creating your own random stuff is usually even worse. Most systems also have a good 'true' random source. .. – the JinX Feb 16 '11 at 12:55
  • 2
    @the JinX: The question is about how to turn that existing "good random source" *of bits* into a random source of uniformly-distributed *doubles*. – j_random_hacker Feb 17 '11 at 04:43
  • @Karl: Perfectly uniform distributions on non-powers-of-2 are easily made by discarding and retrying when out-of-bounds results are produced. – j_random_hacker Feb 17 '11 at 04:51

8 Answers8

9

I don't think I'll ever be convinced that you actually need this, but it was fun to write.

#include <stdint.h>

#include <cmath>
#include <cstdio>

FILE* devurandom;

bool geometric(int x) {
  // returns true with probability min(2^-x, 1)
  if (x <= 0) return true;
  while (1) {
    uint8_t r;
    fread(&r, sizeof r, 1, devurandom);
    if (x < 8) {
      return (r & ((1 << x) - 1)) == 0;
    } else if (r != 0) {
      return false;
    }
    x -= 8;
  }
}

double uniform(double a, double b) {
  // requires IEEE doubles and 0.0 < a < b < inf and a normal
  // implicitly computes a uniform random real y in [a, b)
  // and returns the greatest double x such that x <= y
  union {
    double f;
    uint64_t u;
  } convert;
  convert.f = a;
  uint64_t a_bits = convert.u;
  convert.f = b;
  uint64_t b_bits = convert.u;
  uint64_t mask = b_bits - a_bits;
  mask |= mask >> 1;
  mask |= mask >> 2;
  mask |= mask >> 4;
  mask |= mask >> 8;
  mask |= mask >> 16;
  mask |= mask >> 32;
  int b_exp;
  frexp(b, &b_exp);
  while (1) {
    // sample uniform x_bits in [a_bits, b_bits)
    uint64_t x_bits;
    fread(&x_bits, sizeof x_bits, 1, devurandom);
    x_bits &= mask;
    x_bits += a_bits;
    if (x_bits >= b_bits) continue;
    double x;
    convert.u = x_bits;
    x = convert.f;
    // accept x with probability proportional to 2^x_exp
    int x_exp;
    frexp(x, &x_exp);
    if (geometric(b_exp - x_exp)) return x;
  }
}

int main() {
  devurandom = fopen("/dev/urandom", "r");
  for (int i = 0; i < 100000; ++i) {
    printf("%.17g\n", uniform(1.0 - 1e-15, 1.0 + 1e-15));
  }
}
a dabbler
  • 412
  • 2
  • 4
  • One could make this work for a denormal or zero by replacing frexp with a piece of code that reads the actual exponent. Negative a and b wouldn't be too much harder after that (though you'd have to account for the fact that doubles are in sign-magnitude order and adjust the sampling of x_bits accordingly). – a dabbler Feb 16 '11 at 14:30
  • Unbelievable! This is awesome! – Lior Kogan Feb 16 '11 at 18:41
5

Here is one way of doing it.

The IEEE Std 754 double format is as follows:

[s][     e     ][                          f                         ]

where s is the sign bit (1 bit), e is the biased exponent (11 bits) and f is the fraction (52 bits).

Beware that the layout in memory will be different on little-endian machines.

For 0 < e < 2047, the number represented is

(-1)**(s)   *  2**(e – 1023)  *  (1.f)

By setting s to 0, e to 1023 and f to 52 random bits from your bit stream, you get a random double in the interval [1.0, 2.0). This interval is unique in that it contains 2 ** 52 doubles, and these doubles are equidistant. If you then subtract 1.0 from the constructed double, you get a random double in the interval [0.0, 1.0). Moreover, the property about being equidistant is preserve. From there you should be able to scale and translate as needed.

user515430
  • 3,341
  • 2
  • 17
  • 13
4

I'm surprised that for question this old, nobody had actual code for the best answer. User515430's answer got it right--you can take advantage of IEEE-754 double format to directly put 52 bits into a double with no math at all. But he didn't give code. So here it is, from my public domain ojrandlib:

double ojr_next_double(ojr_generator *g) {
    uint64_t r = (OJR_NEXT64(g) & 0xFFFFFFFFFFFFFull) | 0x3FF0000000000000ull;
    return *(double *)(&r) - 1.0;
}

NEXT64() gets a 64-bit random number. If you have a more efficient way of getting only 52 bits, use that instead.

Community
  • 1
  • 1
Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55
  • I believe that your method would not result in strict uniform distribution. There is no 1-to-1 mapping from the The 2^52 values in the range [1..2] to the range [0..1] that can be uniform. Some random values would be more common than others. – Lior Kogan May 31 '13 at 13:29
  • It's as uniform as possible in IEEE-754 double. The 2^52 double values here are in fact equidistant, ranging from 1.0 to the largest value < 2.0 representable as a double (on my machine this will print as 1.999999999999999778). – Lee Daniel Crocker May 31 '13 at 13:50
  • I should also point out that I have put this code through extensive testing for uniformity and found it quite robust. My library includes the tests which you are welcome to examine. – Lee Daniel Crocker May 31 '13 at 13:56
  • >> "It's as uniform as possible" - I can't agree with this. The challenge is generating all representable values in the range [0..1) with a PDF that ensures uniformity. Your implementation may be good for any practical purpose, but far from perfect (in a theoretical sense). – Lior Kogan May 31 '13 at 14:20
  • You're right, in that there are more than 2^52 representable values in the interval [0,1), but they are not uniform, so code that makes it possible to generate all of these while still being uniform would be quite complex. I suppose it's doable, and might be an interesting exercise, but you'd only gain one extra bit of precision over the code given, so I doubt it would be worth it. – Lee Daniel Crocker May 31 '13 at 14:30
  • You're also right ;-) It won't worth the trouble for the range [0,1), but what about an arbitrary range, say [0,1e6)? Multiplying the result of by 1e6 would be very suboptimal. – Lior Kogan May 31 '13 at 14:42
  • That's a fair point. I would hope that someone writing code for which that made a difference wouldn't use my next_double() function (which is documented to have only 52 bits of precision), but would roll his own based on 64-bit (or larger) integers. I generally hate floating point for any purpose, but people demand it, so I put it in the library (and it was needed for the normals and exponentials too). – Lee Daniel Crocker May 31 '13 at 14:48
3

This is easy, as long as you have an integer type with as many bits of precision as a double. For instance, an IEEE double-precision number has 53 bits of precision, so a 64-bit integer type is enough:

#include <limits.h>
double GetRandomVal(double fMin, double fMax) {
  unsigned long long n ;
  GetRandomBits ((char*)&n, sizeof(n)) ;
  return fMin + (n * (fMax - fMin))/ULLONG_MAX ;
}
TonyK
  • 16,761
  • 4
  • 37
  • 72
  • 1
    Actually, you are mapping from 2^64 possible values to 2^53 possible values Such mappling will not supply uniform distribution (yes, such accuracy is important for me). – Lior Kogan Feb 16 '11 at 11:47
  • 4
    *Floating-point numbers* aren't uniformly distributed. If you need better than that, you're going to have to build yourself the mythical Real RAM. – a dabbler Feb 16 '11 at 11:51
  • @a dabbler: Thank you, this is true indeed. However, won't floating-points numbers in a given narrow range (from 34 to 35 for example) be uniformly distributed? – Lior Kogan Feb 16 '11 at 11:56
  • @Lior Kogan: Why wouldn't it be uniform? By compressing the values, you don't change uniformity any more than the fundamental graininess of a IEEE float. – tenfour Feb 16 '11 at 13:30
  • To play devil's advocate: this function doesn't make use of the full significand for values near zero, and perhaps the questioner wants to perform additional, accuracy-sensitive processing in the rare event that the random number is very small compared to the bounds. The *right* thing to do is probably to pull an additional random value, but who knows? As usual, we don't know what the real problem is. – a dabbler Feb 16 '11 at 13:56
  • @Lior Kogan: OK, tell us what fMin and fMax are, and I'll update my answer. Otherwise this thing is hard: Supppose, for instance, fMin = 0 and fMax = 1, and you require that every possible representation of a number x in [0,1] is chosen with probalility equal to the size of the interval it represents. So there are about 2^63 such intervals, but their size varies from about 2^-1023 to about 2^-53. So you can end up needing 1023 random bits in the worst case. – TonyK Feb 16 '11 at 14:34
2

This is probably not the answer you want, but the specification here:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf

in sections [rand.util.canonical] and [rand.dist.uni.real], contains sufficient information to implement what you want, though with slightly different syntax. It isn't easy, but it is possible. I speak from personal experience. A year ago I knew nothing about random numbers, and I was able to do it. Though it took me a while... :-)

Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
1

The question is ill-posed. What does uniform distribution over floats even mean?

Taking our cue from discrepancy, one way to operationalize your question is to define that you want the distribution that minimizes the following value:

\int_{t=fmin}^{fmax} \left(p\left(x \leq \text{t} \right ) - \frac{t-fmin}{fmax-fmin} \right )^2dt

Where x is the random variable you are sampling with your GetRandomVal(double fMin, double fMax) function, and p(x <= t means the probability that a random x is smaller or equal to t.

And now you can go on and try to evaluate eg a dabbler's answer. (Hint all the answers that fail to use the whole precision and stick to eg 52 bits will fail this minimization criterion.)

However, if you just want to be able to generate all float bit patterns that fall into your specified range with equal possibility, even if that means that eg asking for GetRandomVal(0,1000) will create more values between 0 and 1.5 than between 1.5 and 1000, that's easy: any interval of IEEE floating point numbers when interpreted as bit patterns map easily to a very small number of intervals of unsigned int64. See eg this question. Generating equally distributed random values of unsigned int64 in any given interval is easy.

Matthias
  • 133
  • 2
  • 10
0

I may be misunderstanding the question, but what stops you simply sampling the next n bits from the random bit stream and converting that to a base 10 number number ranged 0 to 2^n - 1.

Ben
  • 2,858
  • 1
  • 18
  • 11
  • ... Which would be an integer range 0 to 2^n-1. I want a floating-point ranged Min to Max. – Lior Kogan Feb 16 '11 at 11:51
  • @Lior Kogan why assume that Ben is talking about an integer. A stream of digits can also be floating point. Sample n bits for the integral part and n bits for the fractional part. – tenfour Feb 16 '11 at 23:08
  • Or sample two integers and divide the smaller by the larger. – image_doctor May 15 '12 at 10:33
0

To get a random value in [0..1[ you could do something like:

double value = 0;
for (int i=0;i<53;i++)
   value = 0.5 * (value + random_bit());  // Insert 1 random bit
   // or value = ldexp(value+random_bit(),-1);
   // or group several bits into one single ldexp
return value;
Eric Bainville
  • 9,738
  • 1
  • 25
  • 27