9

Odd thing happens when in R when I do set.seed(0) and set.seed(1);

set.seed(0)
sample(1:100,size=10,replace=TRUE)
#### [1] 90 27 38 58 91 21 90 95 67 63


set.seed(1)
sample(1:100,size=10,replace=TRUE)
#### [1] 27 38 58 91 21 90 95 67 63  7

When changing the seed from 0 to 1, I get the exact same sequence, but shifted over by 1 cell!

Note that if I do set.seed(2), I do get what appears to be a completely different (random?) vector.

set.seed(2)
sample(1:100,size=10,replace=TRUE)
#### [1] 19 71 58 17 95 95 13 84 47 55

Anyone know what's going on here?

bigO6377
  • 1,256
  • 3
  • 14
  • 28

2 Answers2

15

This applies to the R implementation of the Mersenne-Twister RNG.

set.seed() takes the provided seed and scrambles it (in the C function RNG_Init):

for(j = 0; j < 50; j++)
  seed = (69069 * seed + 1);

That scrambled number (seed) is then scrambled 625 times to fill out the initial state for the Mersenne-Twister:

for(j = 0; j < RNG_Table[kind].n_seed; j++) {
  seed = (69069 * seed + 1);
  RNG_Table[kind].i_seed[j] = seed;
}

We can examine the initial state for the RNG using .Random.seed:

set.seed(0)
x <- .Random.seed

set.seed(1)
y <- .Random.seed

table(x %in% y)

You can see from the table that there is a lot of overlap. Compare this to seed = 3:

set.seed(3)
z <- .Random.seed

table(z %in% x)
table(z %in% y)

Going back to the case of 0 and 1, if we examine the state itself (ignoring the first two elements of the vector which do not apply to what we are looking at), you can see that the state is offset by one:

x[3:10]
# 1280795612 -169270483 -442010614 -603558397 -222347416 1489374793  865871222
# 1734802815

y[3:10] 
# -169270483 -442010614 -603558397 -222347416 1489374793  865871222 1734802815
# 98005428

Since the values selected by sample() are based on these numbers, you get the odd behavior.

Christopher Louden
  • 7,540
  • 2
  • 26
  • 29
  • Thanks! This definitely adds some clarity to this issue. I'll have to stare harder at this code to see how/why 0 and 1 come out so similarly. – bigO6377 Feb 11 '14 at 21:49
  • Welcome. Keep in mind that the maximum integer is 2^32 - 1 and that the number will overflow to the negative. – Christopher Louden Feb 11 '14 at 21:53
  • 2
    Notice that if you `set.seed(69070)` then you get an additional shift by 1 (makes sense given the 1st for loop above). We could extend the loop to find the next shift by 1, etc. The 0,1 combination is just interesting being next to each other, without seeing the loop it is unlikely that anyone would by chance be comparing 1 and 69070. – Greg Snow Feb 11 '14 at 22:09
  • @bigO6377 Looking at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html it seems that the MT authors are aware of the problem and fixed it ("improved initialization" -- don't know exactly when), so you should ask the R guys if they can update their version – loreb Feb 12 '14 at 16:35
  • 1
    According to http://wellington.pm.org/archive/200704/randomness/mt19937.pl they fixed it at least 7 years ago... – loreb Feb 12 '14 at 16:55
1

As you can see from the other answer, seeds 0 and 1 result in almost similar initial states. In addition, Mersenne Twister PRNG has a severe limitation - "almost similar initial states will take a long time to diverge"

It is therefore advisable to use alternatives like WELL PRNG (which can be found in randtoolbox package)

Nishanth
  • 6,932
  • 5
  • 26
  • 38