4

Topic : Laziness in Python - Computerphile
URL : https://www.youtube.com/watch?v=5jwV3zxXc8E

From this example it tried to generate prime numbers from a Infinity series of number from 2 to Infinity in Python with

yield and yield from feature

Code:

def nats(n):
    yield n
    yield from nats(n+1)

def sieve(s):
    n = next(s)
    yield n 
    yield from sieve(i for i in s if i%n!=0)

p = sieve(nats(2))
next(p)
next(p)

In function nats it generate Infinity number start from n and yield n every time when next() was called

series = nats(1)
next(series) # output 1
next(series) # output 2
....
next(series) # output n    



Question
In function sieve it calls generator object and yield n which come from s ;s is nats(2) in this example
The next line is the most tricky part

inside the blanket

sieve(i for i in s if i%n!=0)

its loop through the object s when if statement triggered it return i which is

sieve(i)

My question is

  1. what i actually is ? an integer number or a generator object
  2. suppose n = 2 what is the for loop sequence look like
    i % 2 I don't know what is i should be
  3. when it trigger if i%n!=0 it return the k iter i or just kth single number / object
  4. after all above finally it becomes sieve( i ) what's the difference from sieve(nats( 2 ))
  5. can someone step by step walk through how the for loop works
  • I'm not sure I understand what's being asked, but there is no `sieve(int)` being called like you seem to think. `sieve` is called with a generator object created by `i for i in s if i%n!=0`. Consider the simpler code: `(i for i in range(0))` which returns ` at 0x7f82fd061970>` if you run this on a repl. That's what's going into the function. – ggorlen Nov 11 '20 at 06:43
  • I quite confusing about the code especially the for loop – Linear Algebra fans Nov 11 '20 at 06:48
  • It seems so. The "loop" is a generator expression--it's not like it does any iteration on the spot. It doesn't do anything other than create the generator object. It's not until you hit a `yield` inside the function that you begin extracting integers from it. Same if you call `next` on it--that's the whole point of laziness--nothing happens up front and you pick items out of the object later on. – ggorlen Nov 11 '20 at 06:52

3 Answers3

2
  1. sieve is always called with a generator object. As noted, nats(n) (which does take an int) is a generator since it contains a yield. So that explains p = sieve(nats(2)). Let's look at what happens when you call sieve(i for i in s if i%n!=0). That bit in the middle is actually a generator itself! It is not calling sieve with any specific value of i but with the way to generate i.

So when you call sieve like that, the first line is n = next(s). This asks for the next value which s produces. That's part of what the video is trying to explain is that these generators do stuff as needed. (i for i in s if i%n!=0) does not immediately find an i and pass it into sieve, if provides sieve a way to find an i when it needs. This is what next(s) is doing, it asks to actually find a specific i. In this case, it will look for the next value in the original s (remember that s now means something different) where i%n != 0.

  1. So let's say that n was equal to 2. This does not mean that s was 2, but that the first value that you got from s was 2. So when it calls sieve again, the next value will be three which causes 3%2 != 0 to be true and thus it is yielded from the generator and so on it goes.

  2. So this goes back to how (i for i in s if i%n!=0) is a generator. It won't return anything right way, it won't give a number to sieve. It is only after it calls sieve and that new call asks for the next value that it will grab the next one. So it does produce a number, but only when you call next(s) to ask for it.

  3. I hope that you can see now that it does not ever call sieve(i), it only ever calls sieve with a generator. nats(2) is a generator, but so is (i for i in s if i%n!=0).

I think that's also explained number five for you but if you are still confused then I (or someone else) can show exactly what is happening though it is hard to explain with text only.

Bryce
  • 121
  • 3
0

Finally I realize what happen Firstly

yield from sieve(i for i in s if i%n!=0)

is only a expression just like regular expression it provides pattern only
actually the expression look like:

1th time u visit the code   
s1 = i for i in nats(2) if i % 2! = 0

2nd time u visit
s2 = i for i in s1 if i % 3! = 0

3rd time u visit
s3 = i for i in s2 if i % 5! = 0

....

nth time u visit
sn = i for i in (sn-1) if i % n_prime ! = 0

every time u visit n = next(s) then nats(2) will increase 1 and start operating with pattern

For example Let let say we have s3 pattern in line n = next(s)
1.First it run s1 with 6 % 2 it return nothing since it cannot full fill the if statement
2.It run s1 again with 7 % 2 this time it full fill the if statement

then pass it to s2 it also full fill the if statement this time
then pass it to s3 it also full fill the if statement
because no more expression left
it output the result with yield n which is 7

If it is wrong I hope someone correct it

0

First of all, this code is overly-clever and a great example of why not to use recursion in Python:

>>> g = nats(10)
>>> [next(g) for _ in range(1000)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "<stdin>", line 3, in nats
File "<stdin>", line 3, in nats
File "<stdin>", line 3, in nats
[Previous line repeated 994 more times]
RecursionError: maximum recursion depth exceeded

We can't even generate 1000 natural numbers or primes without blowing the call stack.

Digging into the code, let's start with yield from and nats. yield from gives recursive calls the ability to yield results to the generator returned by the original calling code. nats generates an infinite sequence of natural numbers from n to infinity.

Actually, nats already exists in Python as itertools.count. That won't blow the stack:

>>> from itertools import count
>>> g = count(10)
>>> len([next(g) for _ in range(10000000)])
10000000

If you had to write nats yourself, you'd do it more directly and safely with a loop (itertools.count's implementation is similar):

def nats(start=0):
    while True:
        yield start
        start += 1

We can see based on nats that generators offer state; results aren't returned until you ask for them with next(), and execution is paused after each yield. This is useful for infinite sequences because we can take what we want, when we want, without using extra space to store a list of all previous numbers along the way or restarting from scratch.

While I'm ripping on it, nats is not the greatest name; it's unclear what it means absent context, and the function works fine on non-natural numbers like negative numbers.


sieve does the same sort of thing as nats, recursively stepping forward prime-by-prime. Each recursive call creates a new generator that performs the sieving based on output from the previous generator s (s should be called something like last_sieve), (i for i in s if i%n != 0). This generator skips any number that's a multiple of the first prime number generated by the previous generator in the last recursive call, n.

The critical realization is that the generators don't just disappear: they stick around in a call frame filtering on one particular prime and continue to be invoked by future generators in deeper frames.

It's sort of like a bucket brigade. The first generator sends a stream of all numbers to the second generator, the second generator filters out numbers % 2, the third generator filters further on % 3, the four generator filters the stream on % 5... Every frame, the generator chain gets 1 longer, and numbers have to go through more and more filters to be considered prime.

Here's an iterative version of the algorithm that doesn't blow the stack and has some debug prints that show you the workings of the generators. You can see which numbers are being rejected on each step (the numbers in brackets are the unique monotonically increasing identifier of each generator):

from itertools import count

def make_debuggable_gen(gen_expr, identifier):
    while True:
        val = next(gen_expr)
        print(f"[{identifier}] emitting '{val}'")
        yield val
        # note: no need to except StopIteration since our generators are infinite

def make_prime_gen(last_gen, prime, identifier):
    return make_debuggable_gen((n for n in last_gen if n % prime), identifier)

def sieve():
    identifier = 0
    prime_gen = make_prime_gen(count(2), -float("inf"), identifier)

    while True:
        prime = next(prime_gen)
        yield prime
        identifier += 1
        prime_gen = make_prime_gen(prime_gen, prime, identifier)

if __name__ == "__main__":
    s = sieve()
    
    for _ in range(6):
        print(next(s))

Sample run:

[0] emitting '2'
2
[0] emitting '3'
[1] emitting '3'
3
[0] emitting '4'
[0] emitting '5'
[1] emitting '5'
[2] emitting '5'
5
[0] emitting '6'
[0] emitting '7'
[1] emitting '7'
[2] emitting '7'
[3] emitting '7'
7
[0] emitting '8'
[0] emitting '9'
[1] emitting '9'
[0] emitting '10'
[0] emitting '11'
[1] emitting '11'
[2] emitting '11'
[3] emitting '11'
[4] emitting '11'
11
[0] emitting '12'
[0] emitting '13'
[1] emitting '13'
[2] emitting '13'
[3] emitting '13'
[4] emitting '13'
[5] emitting '13'
13

Hopefully this answers your questions, but to be explicit:

  1. i is an integer emitted by the previous generator s (we're calling this "last_sieve") from the previous call frame.
  2. Hopefully answered by the debug output above -- the second generator (id 1) has n = 2 because that was the first prime emitted by generator id 0. Generator id 1's sequence of values it passes will be 3, 5, 7... It rejects any even numbers (% 2 == 0), and when it hits 3, it creates the next generator with id 2 that filters out all numbers % 3.
  3. The condition i % n != 0 filters the stream of numbers based on whether they're divisible by the one prime n that this particular generator on this particular call frame cares about. The prime n represents the first prime the previous generator in the chain found (it should probably be called prime or last_prime).
  4. The difference between the initial call, sieve(nats(2)), and the i-th call is that the i-th call is seeded with a generator from the i-1-th call that has filtering on it for a certain prime. On the other hand, the first call frame has no filtering on it, just nats which counts monotonically by 1.
  5. The for loop is just a plain generator expression which is basically a stateful, lazy list comprehension. All it does is infinitely pull numbers from s and won't emit any that don't pass the filter, which in our case is a modulus to test divisibility.

Finally, here's a cleaned up version of the above code without debugging:

from itertools import count

def make_prime_gen(last_gen, prime):
    return (n for n in last_gen if n % prime)

def sieve():
    prime_gen = count(2)

    while True:
        prime = next(prime_gen)
        yield prime
        prime_gen = make_prime_gen(prime_gen, prime)

if __name__ == "__main__":
    s = sieve()
    
    for _ in range(6):
        print(next(s))

Note that the function make_prime_gen acts as a closure over prime in a similar manner as the original code lets each generator keep track of n in its own call frame. We don't have to use a function here, but it's a convenient idiom for keeping track of all of the primes for each generator without keeping a list around.

Even without the inexcusable recursion, the space complexity of this function is a serious drawback that seems to pretty much defeat the idea behind generators. Creating a whole new generator per prime is a serious red flag. Instead of having a simple array or set of previous primes hanging around in the traditional sieve, we have a bunch of generator objects and call frames.

From an efficiency standpoint, not only does the first generator need to step over every number, but it also needs to pass it through an ever-increasing chain of generators to get to the point where it can be emitted. That's similar to the nested loop of the naive algorithm, but the naive algorithm can take advantage of various baked-in skips in its main loop described in on Wikipedia, not to mention less call overhead and probably better cache locality.

ggorlen
  • 44,755
  • 7
  • 76
  • 106