I think you're somewhat mistaken.... let me try to describe the algorithm the way I see it:
First, assume that nextInt(big)
(nextInt, not nextLong) is correctly able to generate a well distributed range of values between 0
(inclusive) and big
(exclusive).
This nextInt(int)
function is used as part of nextLong(long)
So, the algorithm works by looping until the value is less than Integer.MAX_INT, at which point it uses nextInt(int)
. The more interesting thing is what it does before that...
Mathematically, if we take half a number, subtract half of it, and then subtract half of the half, and then half of the half of the half, etc. and we do that enough times it will tend toward zero. Similarly, if you take half of a number, and add it to half of the half of the half, etc. it will tend toward the original number.
What the algorithm does here is it takes half of the number. By doing integer division, if the number is odd, then there's a 'big' half and a 'small' half. The algorithm 'randomly' chooses one of those halves (either big or small).
Then, it randomly chooses to add that half, or not add that half to the output.
It keeps halving the number and (possibly) adding the half until the half is less than Integer.MAX_INT. At that point it simply computes the nextInt(half)
value and adds that to the result.
Assuming the initial long limit was much greater than Integer.MAX_VALUE
then the final result will get all the benefit of nextInt(int) for a large int value which is at least 32 bits of state, as well as 2 bits of state for all the higher bits above Integer.MAX_VALUE
.
The larger the orignal limit is (closer it is to Long.MAX_VALUE), the more times the loop will iterate. At worst case it will go 32 times, but for smaller limits it will go through fewer times. At worst case, for very large limits you will get 64 bits of randomness used for the loops, and then whatever is needed for the nextInt(half)
too.
EDIT: WalkThrough added - Working out the number of outcomes is harder, but, all values of long from 0
to Long.MAX_VALUE - 1
are possible outcomes. As 'proof' using nextLong(0x4000000000000000)
is a neat example to start from because all the halving processes will be even and it has bit 63 set.
Because bit 63 is set (the most significant bit available because bit64 would make the number negative, which is illegal) it means that we will halve the value 32 times before the value is <= Integer.MAX_VALUE (which is 0x000000007fffffff
- and our half
will be 0x0000000004000000
when we get there....). Because halving and bit-shifting are the same process it holds that there are as many halves to do as the difference between the highest bit set and bit 31. 63 - 31 is 32, so we halve things 32 times, and thus we do 32 loops in the while loop. The initial start value of 0x4000000000000000
means that as we halve the value, there will only be one bit set in the half, and it will 'walk' down the value - shifting 1 to the right each time through the loop.
Because I chose the initial value carefully, it is apparent that in the while loop the logic is essentially deciding whether to set each bit or not. It takes half of the input value (which is 0x2000000000000000) and decides whether to add that to the result or not. Let's assume for the sake of argument that all our loops decide to add the half to the offset, in which case, we start with an offset of 0x0000000000000000, and then each time through the loop we add a half, which means each time we add:
0x2000000000000000
0x1000000000000000
0x0800000000000000
0x0400000000000000
.......
0x0000000100000000
0x0000000080000000
0x0000000040000000 (this is added because it is the last 'half' calculated)
At this point our loop has run 32 times, it has 'chosen' to add the value 32 times, and thus has at least 32 states in the value (64 if you count the big/little half decision). The actual offset is now 0x3fffffffc0000000
(all bits from 62 down to 31 are set).
Then, we call nextInt(0x40000000) which, as luck would have it, produces the result 0x3fffffff, using 31 bits of state to get there. We add this value to our offset and get the result:
0x3fffffffffffffff
With a 'perfect' distribution of nextInt(0x40000000)
results we would have had a perfect coverage of the values 0x7fffffffc0000000
through 0x3fffffffffffffff
with no gaps. With perfect randomness in our while loop, our high bits would have been a perfect distribution of 0x0000000000000000
through 0x3fffffffc0000000
Combined there is full coverage from 0 through to our limit (exclusive) of 0x4000000000000000
With the 32 bits of state from the high bits, and the (assumed) minimum 31 bits of state from nextInt(0x40000000)
we have 63bits of state (more if you count the big/little half decision), and full coverage.