I don't have a py2 to play with, so the following benchmarks just compare different implementation details in py3. All the benchmarks are done in IPython 7.22.0 running a Python 3.8.8 kernel using time.process_time
. I took the middle of three times for each run. Results are meaningful to about 1sec, or ~3% accuracy.
Original code, loop takes 35.36sec.
You can make all the numbers into the appropriate fixed width numpy types. That way, you avoid the implicit conversion of all python 2 fixed-width ints into python 3 infinite-precision ints:
def prng(n):
# https://en.wikipedia.org/wiki/Lehmer_random_number_generator
a = np.uint64(48271)
b = np.uint64(0x7fffffff)
n = np.uint64(n)
while True:
n = n * a % b
yield n
g = prng(123)
p = process_time()
for i in range(10**8):
next(g)
q = process_time()
print(q - p, ':', next(g))
The runtime is reduced to 28.05s: a drop of ~21%. BTW, using global a
and b
drops the time by only ~5% to 33.55s.
As @Andrej Kesely suggested, a better way to simulate the fixed-width ints of py2 is using float
in py3, rather than calling on numpy's dispatching machinery every time:
def prng(n):
# https://en.wikipedia.org/wiki/Lehmer_random_number_generator
while True:
n = n * 48271.0 % 2147483647.0
yield n
g = prng(123.0)
p = process_time()
for i in range(10**8):
next(g)
q = process_time()
print(q - p, ':', next(g))
And in fact, we see a runtime of 23.63s, which represents a 33% reduction from the original.
To bypass the generator API, let's rewrite the loop without the generator:
n = 123
p = process_time()
for i in range(10**8):
n = n * 48271 % 0x7fffffff
q = process_time()
print(q - p, n * 48271 % 0x7fffffff)
This runtime is only 26.28s, which is an improvement of ~26%.
Doing the same thing, but with a function call will only save you ~3% (runtime of 34.33s):
def prng(n):
return n * 48271 % 0x7fffffff
n = 123
p = process_time()
for i in range(10**8):
n = prng(n)
q = process_time()
print(q - p, prng(n))
Using float
speeds up the function version as much as it did the generator:
def prng(n):
return n * 48271.0 % 2147483647.0
n = 123.0
p = process_time()
for i in range(10**8):
n = prng(n)
q = process_time()
print(q - p, prng(n))
A runtime of 22.97s is a drop of an additional 33%, just like we saw with the generator.
Running the loop-only solution using float
also helps a lot:
n = 123.0
p = process_time()
for i in range(10**8):
n = n * 48271.0 % 2147483647.0
q = process_time()
print(q - p, n * 48271.0 % 2147483647.0)
The runtime is 12.72s, which is a 64% drop from the original, and a 52% down from the int
loop version.
Clearly the datatype is a significant source of slowness here, but it is also quite likely that python 3's generator machinery adds another 20% or so to the runtime as well. Removing both of those sources of slowness allows us to get a result that is better than half the runtime of the original code.
It is not completely clear how much of the remainder after removing infinite precision types is caused by the generator vs the for
loop machinery. So let's get rid of the for
loop to see what happens:
from itertools import islice
from collections import deque
def prng(n):
# https://en.wikipedia.org/wiki/Lehmer_random_number_generator
while True:
n = n * 48271 % 0x7fffffff
yield n
g = prng(123)
p = process_time()
deque(islice(g, 10**8), maxlen=0)
q = process_time()
print(q - p, ':', next(g))
The runtime is 21.32s, is 40% faster than the original code, indicating that the for
implementation may have become more robust, and therefore more cumbersome in py3 as well.
It gets even better with float
in prng
(exactly as in the first example). Now the runtime is 10.09s, which is a drop of 71%, or ~3x faster than the original code.
One more testable difference, suggested by @chepner is that in py2's, range(10**8)
is equivalent to py3's list(range(10**8))
. This is important for the exact reason that generators seem to be slower in py3.
def prng(n):
# https://en.wikipedia.org/wiki/Lehmer_random_number_generator
while True:
n = n * 48271.0 % 2147483647.0
yield n
g = prng(123.0)
r = list(range(10**8))
p = process_time()
for i in r:
next(g)
q = process_time()
print(q - p, ':', next(g))
This version takes 20.62s, which is about 13% faster than the same code but with a generated range
, and 42% better than the original code. So clearly, generator machinery is a significant factor here as well.