I am investigating whether it is possible to have two sets of software agree on a sequence of produced pseudo-random numbers. I am as interested in understanding all the possible points of divergence as I am in actually finding a way to get them to agree.
Why? I work in a data shop that uses many different software packages (Stata, R, Python, SAS, probably others). There has recently been interest in QCing outputs by replicating processes in another language. For any process that involves random numbers, it would be helpful if we could provide a series of steps ("set this option", etc.) that allow the two packages to agree. If that's not feasible, I'd like to be able to articulate where are the failure points.
A simple example:
Both R and Python's default random number generator is Mersenne-Twister. I set them to the same seed and try to sample from and also look at the "state" of the PRNG. Neither value agrees.
R (3.2.3, 64-bit):
set.seed(20160201)
.Random.seed
sample(c(1, 2, 3, 4, 5))
Python (3.5.1, 64-bit):
import random
random.seed(20160201)
random.getstate()
random.sample([1, 2, 3, 4, 5], 5)