This is a Park-Miller pseudo-random number generator:
def gen1(a=783):
while True:
a = (a * 48271) % 0x7fffffff
yield a
The 783
is just an arbitrary seed. The 48271
is the coefficient recommended by Park and Miller in the original paper (PDF: Park, Stephen K.; Miller, Keith W. (1988). "Random Number Generators: Good Ones Are Hard To Find")
I would like to improve the performance of this LCG. The literature describes a way to avoid the division using bitwise tricks (source):
A prime modulus requires the computation of a double-width product and an explicit reduction step. If a modulus just less than a power of 2 is used (the Mersenne primes 231−1 and 261−1 are popular, as are 232−5 and 264−59), reduction modulo m = 2e − d can be implemented more cheaply than a general double-width division using the identity 2e ≡ d (mod m).
Noting that the modulus 0x7fffffff
is actually the Mersenne prime 2**32 - 1, here is the idea implemented in Python:
def gen2(a=783):
while True:
a *= 48271
a = (a & 0x7fffffff) + (a >> 31)
a = (a & 0x7fffffff) + (a >> 31)
yield a
Basic benchmark script:
import time, sys
g1 = gen1()
g2 = gen2()
for g in g1, g2:
t0 = time.perf_counter()
for i in range(int(sys.argv[1])): next(g)
print(g.__name__, time.perf_counter() - t0)
The performance is improved in pypy (7.3.0 @ 3.6.9), for example generating 100 M terms:
$ pypy lcg.py 100000000
gen1 0.4366550260456279
gen2 0.3180829349439591
Unfortunately, the performance is actually degraded in CPython (3.9.0 / Linux):
$ python3 lcg.py 100000000
gen1 20.650125587941147
gen2 26.844335232977755
My questions:
- Why is the bitwise arithmetic, usually touted as an optimization, actually even slower than a modulo operation in CPython?
- Can you improve the performance of this PRNG under CPython some other way, perhaps using numpy or ctypes?
Note that arbitrary precision integers are not necessarily required here because this generator will never yield numbers longer than:
>>> 0x7fffffff.bit_length()
31