How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

Question

Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform distribution must be preserved and P must be scaled as well.

I think, there is no easy solution for such problem. To simplify it a bit, I ask you how to generate a number in interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], preserving uniformness and density? Thus, for [0 .. 2] it must generate a random number from P*2 uniformly distributed numbers.

The obvious simple solution random() * (x - y) + y will generate not all possible numbers because of the lower density for all abs(x-y)>1.0 cases. Many possible values will be missed. Remember, that random() returns only a number from P possible numbers. Then, if you multiply such number by Q, it will give you only one of P possible values, scaled by Q, but you have to scale density P by Q as well.

reading your comments, it looks like you already know the answer (or you would not be that imperative in your comments, would you ?) i would like to see your solution to this problem, with a an argumented discussion on why your solution works and not those given by other people who answered. — Adrien Plisson, Nov 05 '11 at 11:19
Very similar: http://stackoverflow.com/questions/5015133/generating-random-floating-point-values-based-on-random-bit-stream — Lior Kogan, Nov 06 '11 at 05:49
Are you looking for uniform distribution over the all of the IEEE floating point representations? Or are you looking for uniform distribution over the numeric range, represented as floating point numbers? The later is only possible if your required resolution can be expressed in the mantissa. — Fantius, Nov 08 '11 at 13:56
Best place to start is `http://en.wikipedia.org/wiki//dev/random` - as how you define "preserving uniformness and density" is different from mine, reminds me of Cantor's different levels of infinity;. — Alvin K., Nov 10 '11 at 01:56

Calmarius · Answer 1 · 2011-11-05T11:33:17.133

3

If I understand you problem well, I will provide you a solution: but I would exclude 1, from the range.

N = numbers_in_your_random // [0, 0.2, 0.4, 0.6, 0.8] will be 5

// This turns your random number generator to return integer values between [0..N[;
function randomInt()
{
    return random()*N;
}

// This turns the integer random number generator to return arbitrary
// integer
function getRandomInt(maxValue)
{
    if (maxValue < N)
    {
        return randomInt() % maxValue;
    }
    else
    {
        baseValue = randomInt();
        bRate = maxValue DIV N;
        bMod = maxValue % N;
        if (baseValue < bMod)
        {
            bRate++;
        }
        return N*getRandomInt(bRate) + baseValue;
    }
}

// This will return random number in range [lower, upper[ with the same density as random()
function extendedRandom(lower, upper)
{
    diff = upper - lower;
    ndiff = diff * N;
    baseValue = getRandomInt(ndiff);
    baseValue/=N;
    return lower + baseValue;
}

edited Nov 05 '11 at 11:33

answered Nov 05 '11 at 10:40

Calmarius

18,570
18
110
157

Your function will generate not all possible numbers in range (lower, upper). Many possible numbers will be missed if upper and lower are large values. – psihodelia Nov 05 '11 at 10:43
Well, it depends on the random function you use. Some functions can generate 32768 kind of numbers, some can create 4294967296 kind of ones. – Calmarius Nov 05 '11 at 10:49
Imagine your random() returns 0, 0.5, or 1 only. Apply your method for lower=0 and upper=100 and you will see the problem – psihodelia Nov 05 '11 at 10:51
1

Aha! You should edit your question so that we cannot assume that your random is continuous but may return discrete numbers distributed uniformly. And you want a function that returns numbers uniformly and keep the same distance between them, isn't it? By the way, the obvious solution keeps it uniform, but will have bigger distances between the numbers. :) – Calmarius Nov 05 '11 at 10:56
Please read my updated question, I have specified certain points – psihodelia Nov 05 '11 at 11:25

comingstorm · Accepted Answer · 2011-11-15T02:59:18.500

If you really want to generate all possible floating point numbers in a given range with uniform numeric density, you need to take into account the floating point format. For each possible value of your binary exponent, you have a different numeric density of codes. A direct generation method will need to deal with this explicitly, and an indirect generation method will still need to take it into account. I will develop a direct method; for the sake of simplicity, the following refers exclusively to IEEE 754 single-precision (32-bit) floating point numbers.

The most difficult case is any interval that includes zero. In that case, to produce an exactly even distribution, you will need to handle every exponent down to the lowest, plus denormalized numbers. As a special case, you will need to split zero into two cases, +0 and -0.

In addition, if you are paying such close attention to the result, you will need to make sure that you are using a good pseudorandom number generator with a large enough state space that you can expect it to hit every value with near-uniform probability. This disqualifies the C/Unix rand() and possibly the*rand48() library functions; you should use something like the Mersenne Twister instead.

The key is to dissect the target interval into subintervals, each of which is covered by different combination of binary exponent and sign: within each subinterval, floating point codes are uniformly distributed.

The first step is to select the appropriate subinterval, with probability proportional to its size. If the interval contains 0, or otherwise covers a large dynamic range, this may potentially require a number of random bits up to the full range of the available exponent.

In particular, for a 32-bit IEEE-754 number, there are 256 possible exponent values. Each exponent governs a range which is half the size of the next greater exponent, except for the denormalized case, which is the same size as the smallest normal exponent region. Zero can be considered the smallest denormalized number; as mentioned above, if the target interval straddles zero, the probability of each of +0 and -0 should perhaps be cut in half, to avoid doubling its weight.

If the subinterval chosen covers the entire region governed by a particular exponent, all that is necessary is to fill the mantissa with random bits (23 bits, for 32-bit IEEE-754 floats). However, if the subinterval does not cover the entire region, you will need to generate a random mantissa that covers only that subinterval.

The simplest way to handle both the initial and secondary random steps may be to round the target interval out to include the entirety of all exponent regions partially covered, then reject and retry numbers that fall outside it. This allows the exponent to be generated with simple power-of-2 probabilities (e.g., by counting the number of leading zeroes in your random bitstream), as well as providing a simple and accurate way of generating a mantissa that covers only part of an exponent interval. (This is also a good way of handling the +/-0 special case.)

As another special case: to avoid inefficient generation for target intervals which are much smaller than the exponent regions they reside in, the "obvious simple" solution will in fact generate fairly uniform numbers for such intervals. If you want exactly uniform distributions, you can generate the sub-interval mantissa by using only enough random bits to cover that sub-interval, while still using the aforementioned rejection method to eliminate values outside the target interval.

Do you have/can you point to code that implements your suggestions? — pholser, May 14 '13 at 19:59
Let me see if I understand this correctly. For example, if my target range were [1.0, 8.0), I'd choose sign/exponent pair +/0 with probability p, +/1 with probability 2p, and +/2 with probability 4p, where 7p = 1. If my interval were [1.0, 9.0), would this scheme over-weight the pair +/3? – pholser 18 mins ago — pholser, May 29 '13 at 18:32
The simple rejection mechanism would round the [1.0,9.0) interval out to [1.0,16.0), then retry if the result is >= 9.0 (this is not especially efficient, but it isn't awful, either...) — comingstorm, Oct 31 '13 at 17:23
For the interval [1.0, 9.0), would the sign/exponent pairs and probabilties be: +/0: p, +/1: 2p, +/2: 4p, and +/3: p, with 8p = 1, and then hits on +/3 >= 9.0 discarded? — pholser, Jun 14 '14 at 14:30

alf · Answer 3 · 2011-11-05T11:00:17.397

1

well, [0..1] * 2 == [0..2] (still uniform)

[0..1] - 0.5 == [-0.5..0.5] etc.

I wonder where have you experienced such an interview?

Update: well, if we want to start caring about losing precision on multiplication (which is weird, because somehow you did not care about that in the original task, and pretend we care about "number of values", we can start iterating. In order to do that, we need one more function, which would return uniformly distributed random values in [0..1) — which can be done by dropping the 1.0 value would it ever appear. After that, we can slice the whole range in equal parts small enough to not care about losing precision, choose one randomly (we have enough randomness to do that), and choose a number in this bucket using [0..1) function for all parts but the last one.

Or, you can come up with a way to code enough values to care about—and just generate random bits for this code, in which case you don't really care whether it's [0..1] or just {0, 1}.

edited Nov 05 '11 at 11:00

answered Nov 05 '11 at 10:42

alf

8,377
24
45

using [0..1]*2 omits almost half of possible values – psihodelia Nov 05 '11 at 10:47
1

@psihodelia see the update; it gets well beyond any interview or practical task I can imagine. To start with, "random FP number from [0..1] uniformly" is not the same as "FP representation of random number from [0..1] uniformly": the closer you are to zero, the more FP values you have. – alf Nov 05 '11 at 11:07
Please read my updated question, I have specified certain points – psihodelia Nov 05 '11 at 11:27
See the updated answer :) Split in parts, get a part randomly, handle overlapping points—that's it. – alf Nov 05 '11 at 11:39

Chris Hopman · Answer 4 · 2011-11-12T09:47:32.000

Let me rephrase your question:

Let random() be a random number generator with a discrete uniform distribution over [0,1). Let D be the number of possible values returned by random(), each of which is precisely 1/D greater than the previous. Create a random number generator rand(L, U) with a discrete uniform distribution over [L, U) such that each possible value is precisely 1/D greater than the previous.

--

A couple quick notes.

The problem in this form, and as you phrased it is unsolvable. That is, if N = 1 there is nothing we can do.
I don't require that 0.0 be one of the possible values for random(). If it is not, then it is possible that the solution below will fail when U - L < 1 / D. I'm not particularly worried about that case.
I use all half-open ranges because it makes the analysis simpler. Using your closed ranges would be simple, but tedious.

Finally, the good stuff. The key insight here is that the density can be maintained by independently selecting the whole and fractional parts of the result.

First, note that given random() it is trivial to create randomBit(). That is,

randomBit() { return random() >= 0.5; }

Then, if we want to select one of {0, 1, 2, ..., 2^N - 1} uniformly at random, that is simple using randomBit(), just generate each of the bits. Call this random2(N).

Using random2() we can select one of {0, 1, 2, ..., N - 1}:

randomInt(N) { while ((val = random2(ceil(log2(N)))) >= N); return val; }

Now, if D is known, then the problem is trivial as we can reduce it to simply choosing one of floor((U - L) * D) values uniformly at random and we can do that with randomInt().

So, let's assume that D is not known. Now, let's first make a function to generate random values in the range [0, 2^N) with the proper density. This is simple.

rand2D(N) { return random2(N) + random(); }

rand2D() is where we require that the difference between consecutive possible values for random() be precisely 1/D. If not, the possible values here would not have uniform density.

Next, we need a function that selects a value in the range [0, V) with the proper density. This is similar to randomInt() above.

randD(V) { while ((val = rand2D(ceil(log2(V)))) >= V); return val; }

And finally...

rand(L, U) { return L + randD(U - L); }

We now may have offset the discrete positions if L / D is not an integer, but that is unimportant.

--

A last note, you may have noticed that several of these functions may never terminate. That is essentially a requirement. For example, random() may have only a single bit of randomness. If I then ask you to select from one of three values, you cannot do so uniformly at random with a function that is guaranteed to terminate.

score 1 · Answer 5 · answered Nov 13 '11 at 21:21

Consider this approach:

I'm assuming the base random number generator in the range [0..1] generates among the numbers

0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p-1)

If the target interval length is less than or equal to 1, return random()*(y-x) + x.

Else, map each number r from the base RNG to an interval in the target range:

[r*(p-1)*(y-x)/p, (r+1/(p-1))*(p-1)*(y-x)/p]

(i.e. for each of the P numbers assign one of P intervals with length (y-x)/p)

Then recursively generate another random number in that interval and add it to the interval begin.

Pseudocode:

const p;

function rand(x, y)
  r = random()
  if y-x <= 1
    return x + r*(y-x)
  else
    low = r*(p-1)*(y-x)/p
    high = low + (y-x)/p
    return x + low + rand(low, high)

score 0 · Answer 6 · answered Nov 05 '11 at 11:14

In real math: the solution is just the provided:

return random() * (upper - lower) + lower

The problem is that, even when you have floating point numbers, only have a certain resolution. So what you can do is apply above function and add another random() value scaled to the missing part.

If I make a practical example it becomes clear what I mean:

E.g. take random() return value from 0..1 with 2 digits accuracy, ie 0.XY, and lower with 100 and upper with 1100.

So with above algorithm you get as result 0.XY * (1100-100) + 100 = XY0.0 + 100. You will never see 201 as result, as the final digit has to be 0.

Solution here would be to generate again a random value and add it *10, so you have accuracy of one digit (here you have to take care that you dont exceed your given range, which can happen, in this case you have to discard the result and generate a new number).

Maybe you have to repeat it, how often depends on how many places the random() function delivers and how much you expect in your final result.

In a standard IEEE format has a limited precision (i.e. double 53 bits). So when you generate a number this way, you never need to generate more than one additional number.

But you have to be careful that when you add the new number, you dont exceed your given upper limit. There are multiple solutions to it: First if you exceed your limit, you start from new, generating a new number (dont cut off or similar, as this changes the distribution).

Second possibility is to check the the intervall size of the missing lower bit range, and find the middle value, and generate an appropiate value, that guarantees that the result will fit.

Fantius · Answer 7 · 2011-11-07T21:00:53.840

You have to consider the amount of entropy that comes from each call to your RNG. Here is some C# code I just wrote that demonstrates how you can accumulate entropy from low-entropy source(s) and end up with a high-entropy random value.

using System;
using System.Collections.Generic;
using System.Security.Cryptography;

namespace SO_8019589
{
  class LowEntropyRandom
  {
    public readonly double EffectiveEntropyBits;
    public readonly int PossibleOutcomeCount;
    private readonly double interval;
    private readonly Random random = new Random();
    public LowEntropyRandom(int possibleOutcomeCount)
    {
      PossibleOutcomeCount = possibleOutcomeCount;
      EffectiveEntropyBits = Math.Log(PossibleOutcomeCount, 2);
      interval = 1.0 / PossibleOutcomeCount;
    }
    public LowEntropyRandom(int possibleOutcomeCount, int seed)
      : this(possibleOutcomeCount)
    {
      random = new Random(seed);
    }
    public int Next()
    {
      return random.Next(PossibleOutcomeCount);
    }
    public double NextDouble()
    {
      return interval * Next();
    }
  }

  class EntropyAccumulator
  {
    private List<byte> currentEntropy = new List<byte>();
    public double CurrentEntropyBits { get; private set; }
    public void Clear()
    {
      currentEntropy.Clear();
      CurrentEntropyBits = 0;
    }
    public void Add(byte[] entropy, double effectiveBits)
    {
      currentEntropy.AddRange(entropy);
      CurrentEntropyBits += effectiveBits;
    }
    public byte[] GetBytes(int count)
    {
      using (var hasher = new SHA512Managed())
      {
        count = Math.Min(count, hasher.HashSize / 8);
        var bytes = new byte[count];
        var hash = hasher.ComputeHash(currentEntropy.ToArray());
        Array.Copy(hash, bytes, count);
        return bytes;
      }
    }
    public byte[] GetPackagedEntropy()
    {
      // Returns a compact byte array that represents almost all of the entropy.
      return GetBytes((int)(CurrentEntropyBits / 8));
    }
    public double GetDouble()
    {
      // returns a uniformly distributed number on [0-1)
      return (double)BitConverter.ToUInt64(GetBytes(8), 0) / ((double)UInt64.MaxValue + 1);
    }
    public double GetInt(int maxValue)
    {
      // returns a uniformly distributed integer on [0-maxValue)
      return (int)(maxValue * GetDouble());
    }
  }

  class Program
  {
    static void Main(string[] args)
    {
      var random = new LowEntropyRandom(2);  // this only provides 1 bit of entropy per call
      var desiredEntropyBits = 64; // enough for a double
      while (true)
      {
        var adder = new EntropyAccumulator();
        while (adder.CurrentEntropyBits < desiredEntropyBits)
        {
          adder.Add(BitConverter.GetBytes(random.Next()), random.EffectiveEntropyBits);
        }
        Console.WriteLine(adder.GetDouble());
        Console.ReadLine();
      }
    }

  }
}

Since I'm using a 512-bit hash function, that is the max amount of entropy that you can get out of the EntropyAccumulator. This could be fixed, if necessarily.

deejayy · Answer 8 · 2011-11-11T08:17:36.337

When you generate a random number with random(), you get a floating point number between 0 and 1 having an unknown precision (or density, you name it).

And when you multiply it with a number (NUM), you lose this precision, by lg(NUM) (10-based logarithm). So if you multiply by 1000 (NUM=1000), you lose the last 3 digits (lg(1000) = 3).

You may correct this by adding a smaller random number to the original, which has this missing 3 digits. But you don't know the precision, so you can't determine where are they exactly.

I can imagine two scenarios:

(X = range start, Y = range end)

1: you define the precision (PREC, eg. 20 digits, so PREC=20), and consider it enough to generate a random number, so the expression will be:

( random() * (Y-X) + X ) + ( random() / 10 ^ (PREC-trunc(lg(Y-X))) )

with numbers: (X = 500, Y = 1500, PREC = 20)

( random() * (1500-500) + 500 ) + ( random() / 10 ^ (20-trunc(lg(1000))) )
( random() * 1000 + 500 ) + ( random() / 10 ^ (17) )

There are some problems with this:

2 phase random generation (how much will it be random?)
the first random returns 1 -> result can be out of range

2: guess the precision by random numbers

you define some tries (eg. 4) to calculate the precision by generating random numbers and count the precision every time:

- 0.4663164 -> PREC=7
- 0.2581916 -> PREC=7
- 0.9147385 -> PREC=7
- 0.129141  -> PREC=6 -> 7, correcting by the average of the other tries

That's my idea.

score 0 · Answer 9 · answered Nov 11 '11 at 03:52

If I understand your problem correctly, it's that rand() generates finely spaced but ultimately discrete random numbers. And if we multiply it by (y-x) which is large, this spreads these finely spaced floating point values out in a way that is missing many of the floating point values in the range [x,y]. Is that all right?

If so, I think we have a solution already given by Dialecticus. Let me explain why he is right.

First, we know how to generate a random float and then add another floating point value to it. This may produce a round off error due to addition, but it will be in the last decimal place only. Use doubles or something with finer numerical resolution if you want better precision. So, with that caveat, the problem is no harder than finding a random float in the range [0,y-x] with uniform density. Let's say y-x = z. Obviously, since z is a floating point it may not be an integer. We handle the problem in two steps: first we generate the random digits to the left of the decimal point and then generate the random digits to the right of it. Doing both uniformly means their sum is uniformly distributed across the range [0,z] too. Let w be the largest integer <= z. To answer our simplified problem, we can first pick a random integer from the range {0,1,...,w}. Then, step #2 is to add a random float from the unit interval to this random number. This isn't multiplied by any possibly large values, so it has as fine a resolution as the numerical type can have. (Assuming you're using an ideal random floating point number generator.)

So what about the corner case where the random integer was the largest one (i.e. w) and the random float we added to it was larger than z - w so that the random number exceeds the allowed maximum? The answer is simple: do all of it again and check the new result. Repeat until you get a digit in the allowed range. It's an easy proof that a uniformly generated random number which is tossed out and generated again if it's outside an allowed range results in a uniformly generated random in the allowed range. Once you make this key observation, you see that Dialecticus met all your criteria.

His solution does not meet the criteria. See my comment on his answer. — Fantius, Nov 12 '11 at 13:30

How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

9 Answers9