2

I can use one of two methods to create a pseudo random number sequence that has two important characteristics - (1) it is reproducible on different machines, and (2) the sequence never repeats a number within range until all have been emitted.

My question is - do either of these methods have potential issues with regards to portability (OS, Python versions, etc)? For example, does anyone know if I would get one set of results on one system but a different one on another when XXX is true?

I'm not really asking for advice on which method to use per se, only if I should watch out for X on Y system when Z is true.

I have tried on a few versions of Linux, all 64bit, and they seem consistent, but I don't have easy access to Windows or 32 bit versions.

Note that they don't produce the same ranges as each other, but that's OK for my purposes. The numbers just have to appear random to the human eye.

Method 1
generates a random sample from a range using native Python library functions. It's slow if I use large ranges (10m or more) but OK for relatively small ones, and is much easier to understand for people without a degree in maths :

import random
random.seed(5)
x = random.sample(range(10000,99999),89999)
for i in range(10):
   print(x[i])

Method 2
uses an algorithm not from a Python library :
(https://en.wikipedia.org/wiki/Linear_congruential_generator)
It's super fast even for massive ranges, but harder to understand and therefore spot potential issues with :

def lcg(modulus, a, c, seed):
  while True:
    seed = (a * seed + c) % modulus
    yield seed


m = 10000019
c = int(m/2)
a = 5653
s = a

g = lcg(m,a,c,s)
for _ in range(10):
  print(next(g))

Note I am more than open to alternatives; the original question was asked here : https://math.stackexchange.com/questions/3289084/generate-a-pseudo-random-predictable-non-repeating-integer-sequence-purely-math

Peter O.
  • 32,158
  • 14
  • 82
  • 96
David Wylie
  • 637
  • 1
  • 8
  • 24
  • @Dukeling - the RNG only has to be human-good (ie look like it's random) just so long as it doesn't repeat and can be reproduced from machine to machine. I suppose the question is nebulous, I guess portability issues might include how things are worked out under the hood that might make a seed on one machine behave differently to another. I tend to find other more experienced people just sort of "know" these things, so I thought it was worth asking. I will of course test as much as I can. – David Wylie Jul 11 '19 at 09:24
  • Just use a specific RNG. If your LCG meets your needs then use it, no reason to fret over it. python and almost every other programming language are all about portability, they wouldn't be of much use if their behavior changed from machine to machine. – President James K. Polk Jul 11 '19 at 12:11

3 Answers3

1

Most portable version, IMO, would be LCG with period equal to natural word size of the machine. It uses register overflow for module which makes it even faster. You have to use NumPy datatypes to do that, here is simple code, constants a, c are taken from Table 4 here

import numpy as np

def LCG(seed: np.uint64, a: np.uint64, c: np.uint64) -> np.uint64:
    with np.errstate(over='ignore'):
        while True:
            seed = (a * seed + c)
            yield seed

a = np.uint64(2862933555777941757)
c = np.uint64(1)

rng64 = LCG(np.uint64(17), a, c)

print(next(rng64))
print(next(rng64))
print(next(rng64))

Both Linux x64 and Windows x64 as well as OS X VM works exactly the same.

Concerning reproducibility, the only good is to store first couple numbers and compare them with LCG output during app initialization stage - if they are ok, you proceed further.

Another feature of the LCG I like is it's ability to jump ahead in log2(N) time, where N is number of skips. I could provide you with code to do that. Using jump ahead you could ensure non-overlapping sequences for parallel independent random streams

UPDATE

Here is translation of my C code into Python/NumPy, seems to work. It could skip forward as well as backward in logarithmic time.

import numpy as np

class LCG(object):

    UZERO: np.uint64 = np.uint64(0)
    UONE : np.uint64 = np.uint64(1)

    def __init__(self, seed: np.uint64, a: np.uint64, c: np.uint64) -> None:
        self._seed: np.uint64 = np.uint64(seed)
        self._a   : np.uint64 = np.uint64(a)
        self._c   : np.uint64 = np.uint64(c)

    def next(self) -> np.uint64:
        self._seed = self._a * self._seed + self._c
        return self._seed

    def seed(self) -> np.uint64:
        return self._seed

    def set_seed(self, seed: np.uint64) -> np.uint64:
        self._seed = seed

    def skip(self, ns: np.int64) -> None:
        """
        Signed argument - skip forward as well as backward

        The algorithm here to determine the parameters used to skip ahead is
        described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
        Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
        O(log2(N)) operations instead of O(N). It computes parameters
        A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
        """

        nskip: np.uint64 = np.uint64(ns)

        a: np.uint64 = self._a
        c: np.uint64 = self._c

        a_next: np.uint64 = LCG.UONE
        c_next: np.uint64 = LCG.UZERO

        while nskip > LCG.UZERO:
            if (nskip & LCG.UONE) != LCG.UZERO:
                a_next = a_next * a
                c_next = c_next * a + c

            c = (a + LCG.UONE) * c
            a = a * a

            nskip = nskip >> LCG.UONE

        self._seed = a_next * self._seed + c_next    


np.seterr(over='ignore')

a = np.uint64(2862933555777941757)
c = np.uint64(1)
seed = np.uint64(1)

rng64 = LCG(seed, a, c) # initialization

print(rng64.next())
print(rng64.next())
print(rng64.next())

rng64.skip(-3) # back by 3
print(rng64.next())
print(rng64.next())
print(rng64.next())

rng64.skip(-3) # back by 3
rng64.skip(2) # forward by 2
print(rng64.next())

Anyway, summary of the LCG RNG:

  1. With good constants (see reference to L'Ecuyer paper) it will cover whole [0...264) range without repeating itself. Basically perfect [0...264) -> [0...264) mapping, you could set 0,1,2,3,... as input and get whole range output
  2. It is reversible, you could get previous seed back so mapping is actually bijection, [0...264) <-> [0...264). See Reversible pseudo-random sequence generator for details
  3. It has logarithmic skip forward and backward, so there is no problem to find suitable intervals for parallel computation - start with single seed, and then next thread would be skip(seed, 264/N), next thread skip(seed, 264/N * 2) and so on and so forth. Guaranteed to not overlap
  4. It is simple and fast, though not a very high quality RNG
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • "I could provide you with code to do that" - yes please! I'm finding this whole subject fascinating. Is there a way to restrict the max. digit length in your example? I'm trying where possible to keep it to 7 digits. – David Wylie Jul 12 '19 at 07:00
  • 1
    @DavidWylie Please check update. Not sure about `way to restrict the max. digit length`. If you're talking about multiplier in LCG, it is taken from the paper and known to have good FoM. You could take the code and make it into 32bit LCG, just replace np.uint64 with np.uint32 and it should work, and then pick good values for `a` and `c` from prof.L'Ecuyer paper, and LCG constant(s) would be smaller – Severin Pappadeux Jul 14 '19 at 01:28
0

LCGs are nice. If you want to make LCGs more understandable, you may implement it recursively instead of iteratively to highlight the recursive formula it's based upon. Do it only if you don't worry too much about complexity.

Otherwise, I think method 2 is clear enough for a PRNG.

Elbattore
  • 86
  • 8
  • But are they both portable? I'll edit the question to define that better, but I mean python versions, OS types & versions, etc. – David Wylie Jul 11 '19 at 07:55
  • I don't know how the random lib works (is it based on OS primitives ? What PRNG is it based on? ). Solution 2 works on any current python version 2.7 and 3, and any OS that has python. – Elbattore Jul 11 '19 at 08:32
0

There are many ways an algorithm (particularly a pseudorandom number generator) can have inconsistent results across computers. The most notable way is if the algorithm relies on floating-point numbers (which almost always are limited precision) and floating-point rounding.

Some programming languages are more prone to portability issues than others. For example, unlike in Python, inconsistent results can arise in C or C++ due to—

  • Undefined behavior of signed integer overflows, or
  • how the lengths of certain data types, notably int and long, are defined differently across compilers.

I am not aware of any way the Python code in method 2 can deliver inconsistent results across computers. On the other hand, whether method 1 can do so depends on whether random.sample is implemented the same way across all Python versions you care about and across all computers.

Peter O.
  • 32,158
  • 14
  • 82
  • 96