First of all, this code is overly-clever and a great example of why not to use recursion in Python:
>>> g = nats(10)
>>> [next(g) for _ in range(1000)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "<stdin>", line 3, in nats
File "<stdin>", line 3, in nats
File "<stdin>", line 3, in nats
[Previous line repeated 994 more times]
RecursionError: maximum recursion depth exceeded
We can't even generate 1000 natural numbers or primes without blowing the call stack.
Digging into the code, let's start with yield from
and nats
. yield from
gives recursive calls the ability to yield
results to the generator returned by the original calling code. nats
generates an infinite sequence of natural numbers from n
to infinity.
Actually, nats
already exists in Python as itertools.count
. That won't blow the stack:
>>> from itertools import count
>>> g = count(10)
>>> len([next(g) for _ in range(10000000)])
10000000
If you had to write nats
yourself, you'd do it more directly and safely with a loop (itertools.count
's implementation is similar):
def nats(start=0):
while True:
yield start
start += 1
We can see based on nats
that generators offer state; results aren't returned until you ask for them with next()
, and execution is paused after each yield
. This is useful for infinite sequences because we can take what we want, when we want, without using extra space to store a list of all previous numbers along the way or restarting from scratch.
While I'm ripping on it, nats
is not the greatest name; it's unclear what it means absent context, and the function works fine on non-natural numbers like negative numbers.
sieve
does the same sort of thing as nats
, recursively stepping forward prime-by-prime. Each recursive call creates a new generator that performs the sieving based on output from the previous generator s
(s
should be called something like last_sieve
), (i for i in s if i%n != 0)
. This generator skips any number that's a multiple of the first prime number generated by the previous generator in the last recursive call, n
.
The critical realization is that the generators don't just disappear: they stick around in a call frame filtering on one particular prime and continue to be invoked by future generators in deeper frames.
It's sort of like a bucket brigade. The first generator sends a stream of all numbers to the second generator, the second generator filters out numbers % 2
, the third generator filters further on % 3
, the four generator filters the stream on % 5
... Every frame, the generator chain gets 1 longer, and numbers have to go through more and more filters to be considered prime.
Here's an iterative version of the algorithm that doesn't blow the stack and has some debug prints that show you the workings of the generators. You can see which numbers are being rejected on each step (the numbers in brackets are the unique monotonically increasing identifier of each generator):
from itertools import count
def make_debuggable_gen(gen_expr, identifier):
while True:
val = next(gen_expr)
print(f"[{identifier}] emitting '{val}'")
yield val
# note: no need to except StopIteration since our generators are infinite
def make_prime_gen(last_gen, prime, identifier):
return make_debuggable_gen((n for n in last_gen if n % prime), identifier)
def sieve():
identifier = 0
prime_gen = make_prime_gen(count(2), -float("inf"), identifier)
while True:
prime = next(prime_gen)
yield prime
identifier += 1
prime_gen = make_prime_gen(prime_gen, prime, identifier)
if __name__ == "__main__":
s = sieve()
for _ in range(6):
print(next(s))
Sample run:
[0] emitting '2'
2
[0] emitting '3'
[1] emitting '3'
3
[0] emitting '4'
[0] emitting '5'
[1] emitting '5'
[2] emitting '5'
5
[0] emitting '6'
[0] emitting '7'
[1] emitting '7'
[2] emitting '7'
[3] emitting '7'
7
[0] emitting '8'
[0] emitting '9'
[1] emitting '9'
[0] emitting '10'
[0] emitting '11'
[1] emitting '11'
[2] emitting '11'
[3] emitting '11'
[4] emitting '11'
11
[0] emitting '12'
[0] emitting '13'
[1] emitting '13'
[2] emitting '13'
[3] emitting '13'
[4] emitting '13'
[5] emitting '13'
13
Hopefully this answers your questions, but to be explicit:
i
is an integer emitted by the previous generator s
(we're calling this "last_sieve
") from the previous call frame.
- Hopefully answered by the debug output above -- the second generator (id 1) has
n = 2
because that was the first prime emitted by generator id 0. Generator id 1's sequence of values it passes will be 3, 5, 7... It rejects any even numbers (% 2 == 0
), and when it hits 3, it creates the next generator with id 2 that filters out all numbers % 3
.
- The condition
i % n != 0
filters the stream of numbers based on whether they're divisible by the one prime n
that this particular generator on this particular call frame cares about. The prime n
represents the first prime the previous generator in the chain found (it should probably be called prime
or last_prime
).
- The difference between the initial call,
sieve(nats(2))
, and the i
-th call is that the i
-th call is seeded with a generator from the i-1
-th call that has filtering on it for a certain prime. On the other hand, the first call frame has no filtering on it, just nats
which counts monotonically by 1.
- The
for
loop is just a plain generator expression which is basically a stateful, lazy list comprehension. All it does is infinitely pull numbers from s
and won't emit any that don't pass the filter, which in our case is a modulus to test divisibility.
Finally, here's a cleaned up version of the above code without debugging:
from itertools import count
def make_prime_gen(last_gen, prime):
return (n for n in last_gen if n % prime)
def sieve():
prime_gen = count(2)
while True:
prime = next(prime_gen)
yield prime
prime_gen = make_prime_gen(prime_gen, prime)
if __name__ == "__main__":
s = sieve()
for _ in range(6):
print(next(s))
Note that the function make_prime_gen
acts as a closure over prime
in a similar manner as the original code lets each generator keep track of n
in its own call frame. We don't have to use a function here, but it's a convenient idiom for keeping track of all of the primes for each generator without keeping a list around.
Even without the inexcusable recursion, the space complexity of this function is a serious drawback that seems to pretty much defeat the idea behind generators. Creating a whole new generator per prime is a serious red flag. Instead of having a simple array or set of previous primes hanging around in the traditional sieve, we have a bunch of generator objects and call frames.
From an efficiency standpoint, not only does the first generator need to step over every number, but it also needs to pass it through an ever-increasing chain of generators to get to the point where it can be emitted. That's similar to the nested loop of the naive algorithm, but the naive algorithm can take advantage of various baked-in skips in its main loop described in on Wikipedia, not to mention less call overhead and probably better cache locality.