This result doesn't surprise me given how floating-point numbers are represented. Let's suppose we had a very short floating-point type with only 4 bits of precision. If we were to generate a random number between 0 and 1, distributed uniformly, there would be 16 possible values:
0.0000
0.0001
0.0010
0.0011
0.0100
...
0.1110
0.1111
If that's how they looked in the machine, you could test the low-order bit to get a 50/50 distribution. However, IEEE floats are represented as a power of 2 times a mantissa; one field in the float is the power of 2 (plus a fixed offset). The power of 2 is selected so that the "mantissa" part is always a number >= 1.0 and < 2.0. This means that, in effect, the numbers other than 0.0000
would be represented like this:
0.0001 = 2^(-4) x 1.000
0.0010 = 2^(-3) x 1.000
0.0011 = 2^(-3) x 1.100
0.0100 = 2^(-2) x 1.000
...
0.0111 = 2^(-2) x 1.110
0.1000 = 2^(-1) x 1.000
0.1001 = 2^(-1) x 1.001
...
0.1110 = 2^(-1) x 1.110
0.1111 = 2^(-1) x 1.111
(The 1
before the binary point is an implied value; for 32- and 64-bit floats, no bit is actually allocated to hold this 1
.)
But looking at the above should demonstrate why, if you convert the representation to bits and look at the low bit, you will get zero 75% of the time. This is due to all values less than 0.5 (binary 0.1000
), which is half the possible values, having their mantissas shifted over, causing 0 to appear in the low bit. The situation is essentially the same when the mantissa has 52 bits (not including the implied 1) as a double
does.
(Actually, as @sneftel suggested in a comment, we could include more than 16 possible values in the distribution, by generating:
0.0001000 with probability 1/128
0.0001001 with probability 1/128
...
0.0001111 with probability 1/128
0.001000 with probability 1/64
0.001001 with probability 1/64
...
0.01111 with probability 1/32
0.1000 with probability 1/16
0.1001 with probability 1/16
...
0.1110 with probability 1/16
0.1111 with probability 1/16
But I'm not sure it's the kind of distribution most programmers would expect, so it probably isn't worthwhile. Plus it doesn't gain you much when the values are used to generate integers, as random floating-point values often are.)