2

In the following code, I assume I have two generators yielding sorted and comparable values, and I want to make a generator that yields "synchronized" pairs from the two. By synchronized I mean yielding from both when they yield the same value, advancing only the "delayed" one otherwise (pairing what it yields with None).

from itertools import repeat

def generate_pairs(g1, g2):
    try:
        n1 = next(g1)
    except StopIteration:
        yield from zip(repeat(None), g2)
        # A
        # raise StopIteration
    try:
        n2 = next(g2)
    except StopIteration:
        yield from zip(g1, repeat(None))
        # A
        # raise StopIteration
    while True:
        if n1 > n2:
            yield (None, n2)
            try:
                n2 = next(g2)
            except StopIteration:
                yield (n1, None)
                yield from zip(g1, repeat(None))
                # B
                # raise StopIteration
        elif n1 < n2:
            yield (n1, None)
            try:
                n1 = next(g1)
            except StopIteration:
                yield (None, n2)
                yield from zip(repeat(None), g2)
                # B
                # raise StopIteration
        else:
            yield (n1, n2)
            try:
                n1 = next(g1)
            except StopIteration:
                yield from zip(repeat(None), g2)
                # C
                # raise StopIteration
            try:
                n2 = next(g2)
            except StopIteration:
                yield from zip(g1, repeat(None))
                # C
                # raise StopIteration

Where should I explicitly raise StopIteration ?

In the above state, when I try with already synchronized generators, I see that raising in case C is required.

pairs = generate_pairs((n1 for n1 in [1, 2, 3]), (n2 for n2 in [1, 2, 3]))

The above can go on yielding the last pair (3, 3) forever:

from cytoolz import take
list(take(10, pairs))                                             

Output:

[(1, 1),
 (2, 2),
 (3, 3),
 (3, 3),
 (3, 3),
 (3, 3),
 (3, 3),
 (3, 3),
 (3, 3),
 (3, 3)]

In B too, it seems a manual StopIteration should be raised:

pairs = generate_pairs((n1 for n1 in [1, 3]), (n2 for n2 in [1, 2]))
list(take(10, pairs))

Output:

[(1, 1),
 (None, 2),
 (3, None),
 (None, 2),
 (3, None),
 (None, 2),
 (3, None),
 (None, 2),
 (3, None),
 (None, 2)]

And from the test below, it seems to me that some kind of way of ending the generator is required at A too:

pairs = generate_pairs((_ for _ in []), (n2 for n2 in [1, 2, 3]))
list(take(10, pairs))

Output:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-96-61eb4df81d52> in <module>
----> 1 list(take(10, pairs))

<string> in generate_pairs(g1, g2)

UnboundLocalError: local variable 'n1' referenced before assignment

However, if I uncomment all the raise StopIteration in the code, I need to handle the resulting exceptions manually. They are not automatically handled in for loops, for instance.

I would just like my generator of pairs to stop generating things once both input generators have been exhausted, without drama. What did I get wrong?

Edit

It seems that using return instead of raise StopIteration fixes my code nicely. I'm still interested in some explanations, though.

bli
  • 7,549
  • 7
  • 48
  • 94
  • https://www.python.org/dev/peps/pep-0479/ –  Oct 10 '19 at 14:23
  • @JETM Thanks for the link. I also found https://stackoverflow.com/a/30217723/1878788 stating that the normal way to end a generator is to return. – bli Oct 10 '19 at 14:52

0 Answers0