Uniform distribution of integers using floating point source

Question

The standard way to get a random integer in the range [0, n) in JavaScript - or any other language that only offers a random() function that returns a float in the range [0,1) - is to use Math.floor(Math.random() * n).

Now the math behind this is trivial assuming we're operating on the set of rational numbers. The question is: With all the complications of IEEE-754 floating point numbers is the resulting distribution actually really uniform?

Considering that the gap between one floating point number and the next higher one increases as they grow larger I would think that this should introduce some kind of bias towards smaller numbers.

Is this query in general? or are you trying to solve something? — THE AMAZING, Sep 02 '15 at 18:35
possible duplicate of [How random is JavaScript's Math.random?](http://stackoverflow.com/questions/1062902/how-random-is-javascripts-math-random) — THE AMAZING, Sep 02 '15 at 18:42
@Richard As far as I can see that's only about a misunderstanding of what *uniform distribution* means. — Voo, Sep 02 '15 at 18:44
"So if you select some at random, then that vast majority of selected numbers will have the same number of digits, because the vast majority of possible values have the same number of digits." — THE AMAZING, Sep 02 '15 at 18:46
So no, the resulting distribution is not actually really uniform. — THE AMAZING, Sep 02 '15 at 18:48
@Richard That's about the *number* of digits. That statement is true for any perfectly uniform distribution. — Voo, Sep 02 '15 at 18:49
If the number of digits are biased toward the number then that means there is a bias, meaning it is not uniform. — THE AMAZING, Sep 02 '15 at 18:51
@Richard If I have a uniform distribution of the numbers [1,10] (note the inclusive), you wouldn't assume a uniform distribution in the number of digits, i.e. that the 2 digit numbers would appear as often as the 1 digit numbers, right? That's all that post is about. — Voo, Sep 02 '15 at 18:53
It's very unlikely to be perfectly uniform: under the not unreasonable assumption that `Math.random` generates 2^53 distinct results with equal probability, we can only get uniform results for values of `n` that are exact divisors of `2^53`. For example, with `n=5`, if we divide `2^53` different elements into 5 bins, then no matter how we do it those bins can't all have the same number of elements in them. — Mark Dickinson, Sep 02 '15 at 19:38
See also http://bugs.python.org/issue23974 for a related issue in Python 2, where there's a serious bias as `n` approaches `2^53`. (The issue's fixed in Python 3.) — Mark Dickinson, Sep 02 '15 at 19:39
@Mark Clever way to approach the problem! Basically the same reasoning as to why `randomInt() % n` only works for very specific n's. — Voo, Sep 02 '15 at 20:13
Hmm. It turns out that Chrome, at least, creates `Math.random` results [using only 32 bits](https://code.google.com/p/chromium/issues/detail?id=276886), so there are going to be 2^32 distinct outcomes, not 2^53. But the principle remains. — Mark Dickinson, Sep 02 '15 at 20:15
@Mark while rici did answer the question as well, I find your explanation nicely to the point and very simple (putting together several well known facts to answer the question), so I'd love to give the answer to you if you were to write a post. Edit: Ah so much for that. — Voo, Sep 02 '15 at 20:39
Related: https://stackoverflow.com/questions/6046918/how-to-generate-a-random-integer-in-the-range-0-n-from-a-stream-of-random-bits/62920514#62920514 . The question there is reducible to this one by taking `Math.random()<0.5 ? 1 : 0` as an unbiased random bit. — Peter O., Apr 02 '22 at 23:13

Mark Dickinson · Accepted Answer · 2015-09-02T20:47:20.927

No, the resulting distribution is not going to be perfectly uniform, for most values of n. For small values, it'll be so close to uniform that you'd have a hard time detecting any difference from a uniform distribution, but as n gets larger the bias can become noticeable.

To illustrate, here's some Python code (not JavaScript, sorry, but the principle is the same):

from collections import Counter
from random import random

def badrand(n):
    return int(random() * n)

print(Counter(badrand(6755399441055744) % 3 for _ in range(10000000)))

This is producing 10 million random integers in the range [0, 6755399441055744), reducing each of those integers modulo 3, and counting the number of times the remainder is 0, 1, or 2. If we're generating those integers uniformly, we'd expect the remainders modulo 3 to be roughly evenly distributed, so we'd expect the counts to be similar.

Here's an example result from running this on my machine:

Counter({1: 3751915, 0: 3334643, 2: 2913442})

That is, a remainder of 1 is significantly more likely to occur than 0, which in turn is significantly more likely to occur than a remainder of 2. The differences here are way too big to be explained by random variation.

So what went wrong? Python's random() function is relatively high quality, based on the Mersenne Twister, so we're unlikely to be seeing statistical problems resulting from the base random number generator. What's happening is that random() generates one of 2^53 (roughly) equally likely outcomes - each outcome is a number of the form x / 2^53 for some integer x in the range [0, 2^53). Now in the badrand call, we're effectively mapping those outcomes to 6755399441055744 possible outputs. Now that value wasn't chosen at random (ha!); it's exactly 3/4 of 2^53. That means that under the most uniform distribution possible, 2/3 of the possible badrand output values are being hit by exactly one of the 2^53 possible random() output values, while the other 1/3 are being hit by two of the 2^53 possible random() output values. That is, some of the potential outputs are twice as likely to occur as others. So we're a long way from uniform.

You're going to see the same effect in JavaScript. In the case of Chrome, it appears that there are only 2^32 distinct results from Math.random(), so you should be able to find effects like the above with n smaller than (but close to) 2^32.

Of course, the same effect holds for small n, too: if n = 5, then because 5 is not a divisor of 2^32 there's no way we can perfectly evenly distribute all 2^32 possible Math.random() results between the 5 desired outcomes: the best we can hope for is that 4 of the 5 outcomes appear for 858993459 of the possible random() results each, while the fifth occurs for 858993460 of the random() results. But that distribution is going to be so close to uniform that it would be well-nigh impossible to find any statistical test to tell you differently. So for practical purposes, you should be safe with small n.

There's a related Python bug that might be interesting at http://bugs.python.org/issue9025. That bug was solved for Python 3 by moving away from the int(random() * n) method of computing these numbers. The bug still remains in Python 2, though.

No apology needed - I find python *way* more attractive than JavaScript or TypeScript - the "batteries included" part of Python just doesn't make me think about issues such as this one because there's a nice `random.randInt()` function that (as we can see not necessarily) takes care of it :-) — Voo, Sep 02 '15 at 20:41

score 2 · Answer 2 · answered Sep 02 '15 at 19:57

If Math.random (or equivalent) generated a uniformly-distributed bit pattern out of those bit patterns corresponding to floating point numbers in the range [0, 1), then it would produce an extremely biased sample. There are as many representable floating point number in [0.25, 0.5) as there are in [0.5, 1.0), which is also the same number of representable values in [0.125, 0.25). And so on. In short, the uniformly-distributed bit patterns would result in only one out of a thousand values being between 0.5 and 1.0. (assuming double-precision floating point numbers.)

Fortunately, that's not what Math.random does. One simple way of getting a uniformly distributed number (rather than bit pattern) is to generate a uniformly distributed bit pattern in [1.0, 2.0), and then subtract 1.0; that's a fairly common strategy.

Regardless, the end result of Math.floor(Math.random() * n) is not quite uniformly distributed unless n is a power of 2, because of quantification bias. The number of possible floating point values which could be returned by Math.random is a power of 2, and if n is not a power of 2, then it is impossible to distribute the possible floating point values precisely evenly over all values of integers in [0, n). If Math.random returns a double-precision floating pointer number and n is not huge, this bias is small, but it certainly exists.

score 0 · Answer 3 · edited Apr 13 '17 at 12:44

0

According to http://es5.github.io/x15.8.html#x15.8.2.14

the functionality of Math.random

Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.

check out this post: https://stats.stackexchange.com/questions/40384/fake-uniform-random-numbers-more-evenly-distributed-than-true-uniform-data

this has become above my head, sorry i have nothing left to contribute

edited Apr 13 '17 at 12:44

Community

1
1

answered Sep 02 '15 at 18:55

THE AMAZING

1,496
2
16
38

That gives a gaussian distribution (also called normal distribution). A uniform distribution is *much* simpler: Every number should appear equally often. – Voo Sep 02 '15 at 18:56

Louis Ricci · Answer 4 · 2015-09-02T19:41:43.600

0

Assuming random() is returning a number between 0..1.

If the result is a single precision float than thats only 23bits of entropy based on the mantissa.

If the result is a double precision float than thats only 52bits of entropy based on the mantissa.

So floor(random() * N) would only be uniform where N is less than 2^24 or 2^53.

EDIT Here's some info on largest consecutive integer for floating point http://www.mathworks.com/help/matlab/ref/flintmax.html

edited Sep 02 '15 at 19:41

answered Sep 02 '15 at 19:35

Louis Ricci

20,804
5
48
62

It would be 53 bits of entropy for a double: the usual approach is to produce numbers of the form `x / 2^53` where `x` is an integer that's (roughly) uniformly distributed in `[0, 2^53)`. Yes, the significand only has 52 bits if you exclude the hidden bit, but you need to take the exponent into account as well. – Mark Dickinson Sep 02 '15 at 19:41
@MarkDickinson - I was in the process of editing that and adding the largest consecutive integer reference when you commented. – Louis Ricci Sep 02 '15 at 19:49
I should've been clearer: True, this can even in the best of all situations only work as long as a double can even represent every number. What I'm wondering is whether it is actually uniform below that. – Voo Sep 02 '15 at 19:55

Bert te Velde · Answer 5 · 2015-09-02T19:46:32.247

I assume that your remark that "the gap between one floating point number and the next higher one increases as they grow larger" is based on the following:

in IEEE-754 you have a fixed-size mantissa, which allows for uniform "random" values in the range [1,2) say, and there's an equal number of possible values in [2,4) which is twice as large a range, so we get a 2-fold spacing between the possible values, again twice as large for [4,8), etcetera.

Now, I haven't checked out the technical details behind ".., using an implementation-dependent algorithm or strategy", when they talk about the properties of the random numbers generated for [0,1), but since the above consideration is so trivial, I do assume that the random-generator programmers have been aware of this and took care of it with an "implementation-dependent algo...".

Therefore, being a naive guy, I do believe that there's nothing to worry about with respect to (my assumption of) your reason for suspicion. In fact, I might take it that, if you can generate uniform and random values for the mantissa, then setting always the same exponent, such that the values belong to [1,2), you subtract 1 from everything and have an appropriate distribution for [0,1).

The question is about whether the distribution over [0, n) will be uniform when applying that transformation assuming that [0,1) is uniform. — Voo, Sep 02 '15 at 19:57
What makes you doubt that "simply" multiplying by n destroy uniformity? Of course, roundoff may affect the precise "copying", but that roundoff may also be expected to be rather homogeneous and for instance if n is a multiple of 2, the whole thing amounts to only shifting the exponent and uniformity will not be affected at all. — Bert te Velde, Sep 03 '15 at 06:06
Read Mark's answer for an explanation of why it happens, including code that shows the bias. Yes you won't have problems if n is a multiple of 2 but in any other case you will. The insight here is that this is just an integer random generator with period [0, 2^(k+1)) where k is the significant of the float and all the same restrictions apply. — Voo, Sep 03 '15 at 06:11

Uniform distribution of integers using floating point source

5 Answers5

Linked