0

I'm running on R2012a version. I tried to write a function that imitates randi using rand (only rand), producing the same output when the same arguments are passed and the same seed is provided. I tried something with the command window and here's what I got:

>> s = rng;
>> R1 = randi([2 20], 3, 5)

R1 =

     2    16    11    15    14
    10    17    10    16    14
     9     5    14     7     5

>> rng(s)
>> R2 = 2+18*rand(3, 5)

R2 =

    2.6200   15.7793   10.8158   14.7686   14.2346
    9.8974   16.3136   10.0206   15.5844   13.7918
    8.8681    5.3637   13.6336    6.9685    4.9270

>> 

A swift comparison led me to believe that there's some link between the two: each integer in R1 is within plus or minus unity from the corresponding element in R2. Nonetheless, I failed to go any further: I checked for ceiling, flooring, fixing and rounding but neither of them seems to work.

1 Answers1

2

randi([2 20]) generates integers between 2 and 20, both included. That is, it can generate 19 different values, not 18.

19 * rand

generates values uniformly distributed within the half-open interval [0,19), flooring it gives you uniformly distributed integers in the range [0,18].

Thus, in general,

x = randi([a,b]]);
y = rand * (b-a+1) + a;

should yield numbers with the same property. From OP’s experiment it looks like they might generate the same sequence, but this cannot be guaranteed, and it likely doesn't.

Why? It is likely that randi is not implemented in terms of rand, but it’s underlying random generator, which produces integers. To go from a random integer x in a large range ([0,N-1]) to one in a small range ([0,n-1]), you would normally use the modulo operator (mod(x,N)) or a floored division like above, but remove a small subset of the values that skew the distribution. This other anser gives a detailed explanation. I like to think of it in terms of examples:

Say random values are in the range [0,2^16-1] (N=2^16) and you want values in the range [0,18] (n=19). mod(19,2^16)=5. That is, the largest 5 values that can be generated by the random number generator are mapped to the lowest 5 values of the output range (assuming the modulo method), leaving those numbers slightly more likely to be generated than the rest of the output range. These lowest 5 values have a chance floor(N/n)+1, whereas the rest has a chance floor(N/n). This is bad. [Using floored division instead of modulo yields a different distribution of the unevenness, but the end result is the same: some numbers are slightly more likely than others.]

To solve this issue, a correct implementation does as follows: if you get one of the values in the random generator that are floor(N/n)*n or higher, you need to throw it away and try again. This is a very small chance, of course, with a typical random number generator that uses N=2^64.

Though we don't know how randi is implemented, we can be fairly certain that it follows the correct implementation described here. So your sequence based on rand might be right for millions of numbers, but then start deviating.


Interestingly enough, Octave's randi is implemented as an M-file, so we can see how they do it. And it turns out it uses the wrong algorithm shown at the top of this answer, based on rand:

 ri = imin + floor ( (imax-imin+1)*rand (varargin{:}) );

Thus, Octave's randi is biased!

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120