4

In this question @lazyr asks how the following code of izip_longest iterator from here works:

def izip_longest_from_docs(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
        yield counter()         # yields the fillvalue, or raises IndexError
    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except IndexError:
        pass

When I was trying to understand how it works I stumbled into the question: "What if IndexError is raised inside one of those iterators that are sent to izip_longest as parameters?".

Then I wrote some testing code:

from itertools import izip_longest, repeat, chain, izip

def izip_longest_from_docs(*args, **kwds):
    # The code is exactly the same as shown above
    ....

def gen1():
    for i in range(5):
        yield i

def gen2():
    for i in range(10):
        if i==8:
            raise IndexError #simulation IndexError raised inside the iterator
        yield i

for i in izip_longest_from_docs(gen1(),gen2(), fillvalue = '-'):
    print('{i[0]} {i[1]}'.format(**locals()))

print('\n')

for i in izip_longest(gen1(),gen2(), fillvalue = '-'):
    print('{i[0]} {i[1]}'.format(**locals()))

And it turned out that the function in itertools module and izip_longest_from_docs work differently.

The output of the code above:

>>> 
0 0
1 1
2 2
3 3
4 4
- 5
- 6
- 7


0 0
1 1
2 2
3 3
4 4
- 5
- 6
- 7

Traceback (most recent call last):
  File "C:/..., line 31, in <module>
    for i in izip_longest(gen1(),gen2(), fillvalue = '-'):
  File "C:/... test_IndexError_inside iterator.py", line 23, in gen2
    raise IndexError
IndexError

So, it's clearly seen, that the code of izip_longes from itertools did propagate IndexError exception (as I think it should), but izip_longes_from_docs 'swallowed' IndexError exception as it took it as a signal from sentinel to stop iterating.

My question is, how did they worked around IndexError propagation in the code in theitertools module?

Community
  • 1
  • 1
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
  • @agf I wanted, but I couldn't find it. Thanks for the link! I will give it a look. – ovgolovin Sep 12 '11 at 20:16
  • @agf It seems to be in C :o) A bit too complicated for me. Nevertheless, what alleged to be an **equivalent** doesn't seem to be an absolute equivalent in terms of functionality (not speed). – ovgolovin Sep 12 '11 at 20:19
  • Yep, you're correct. "If you find a bug in this documentation or would like to propose an improvement, please send an e-mail to docs@python.org describing the bug and where you found it. If you have a suggestion how to fix it, include that as well." If you don't feel like it, I'll do it? – agf Sep 12 '11 at 20:24
  • @agf Please, do it. I don't feel very confident to propose any corrections. – ovgolovin Sep 12 '11 at 20:27
  • @agf The only solution I can now come up with is to modify `sentinel` that it would implement some counter and raise it's own special Exception when needed. – ovgolovin Sep 12 '11 at 20:32
  • @agf Also, if you can, please, write here, what solution you will have come up with for the pure Python equivalent to resemble the real function more closely. – ovgolovin Sep 12 '11 at 20:35
  • I got a couple of emails from Python devs -- they said the "equivalent" code only needs to be "approximately correct", that the "equivalent" code is only an "illustration" and that they don't even consider it a bug, it's just an "implementation detail". If you want to see a pure Python implementation that should function identically, you can take a look at the [PyPy itertools implementation](https://bitbucket.org/pypy/pypy/src/6b92b3aa1cbb/pypy/module/itertools/interp_itertools.py#cl-624). – agf Sep 17 '11 at 21:07
  • http://bugs.python.org/issue13003 has now been opened to track this, it's not clear if any action is going to be taken. – agf Sep 18 '11 at 20:18
  • @agf As I think, the 'equivalent' code should be corrected since it may only be approximately correct in terms of time-memory efficiency, but has to be exactly the same in terms of functionality. Not re-raising 'IndexError' is really what I stumbled over some time ago. I can't say it was that bad since it was the reason I started this question and some follow-up questions, which turned out to be very good in terms of studying. But it may cause some difficulties other people who will be reading the documentation. – ovgolovin Sep 18 '11 at 22:02
  • I agree with [Eli](http://stackoverflow.com/users/8206/eli-bendersky) (who created the bug report based on my email to the docs mailing list) that it needs to be made clear on the itertools documentation page page that the "equivalent" implementations differ in some details from the C implementations. – agf Sep 18 '11 at 22:48
  • @agf By the way, the concerns laid out in this discussion have been reflected in the new docs: http://docs.python.org/dev/library/itertools.html?highlight=itertools#itertools.zip_longest – ovgolovin Dec 02 '11 at 00:47
  • Thanks for letting me know -- I wouldn't have seen it otherwise. I left a comment on the bug that they should change the resolution from "wont fix" to "fixed". – agf Dec 02 '11 at 01:08

1 Answers1

3

in izip_longest_next in the code of izip_longest, no sentinel is used.

Instead, CPython keeps track of how many of the iterators are still active with a counter, and stops when the number active reaches zero.

If an error occurs, it ends iteration as if there are no iterators still active, and allows the error to propagate.

The code:

            item = PyIter_Next(it);
            if (item == NULL) {
                lz->numactive -= 1;
                if (lz->numactive == 0 || PyErr_Occurred()) {
                    lz->numactive = 0;
                    Py_DECREF(result);
                    return NULL;
                } else {
                    Py_INCREF(lz->fillvalue);
                    item = lz->fillvalue;
                    PyTuple_SET_ITEM(lz->ittuple, i, NULL);
                    Py_DECREF(it);
                }
            }

The simplest solution I see:

def izip_longest_modified(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    class LongestExhausted(Exception):
        pass
    def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
        try:
            yield counter()         # yields the fillvalue, or raises IndexError
        except:
            raise LongestExhausted
    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except LongestExhausted:
        pass
agf
  • 171,228
  • 44
  • 289
  • 238
  • Thanks! Still, I have to add, that to my mind it's not very good that an alleged **pure** Python **equivalent** has a slightly different functionality than the function itself. Strange, why did they put that equivalent this way. Anyway, to understand that code with `sentinel` was a good task for brain. – ovgolovin Sep 12 '11 at 20:24
  • @ovgo Try it again? I edited it after I first posted it. It seems to work for me. – agf Sep 12 '11 at 21:14
  • Yeah, it works! When I encountered that the first version that you provided didn't work, I started thinking and came up with the following code def sentinel(fillvalue = fillvalue, counter = [0]): def ret(): counter[0] += 1 if counter[0] == len(args): raise LongestExhausted yield fillvalue return ret() – ovgolovin Sep 12 '11 at 21:22
  • Oops. Seems, SO doesn't keep indentation in comments. I'll try to explain. I added counter = [0] so as to share this value with all the iterator returned by `sentinel` (as `list` is mutable). Still, curious is it OK to use counter = [0] in the parameter list. But I can't be put into the body if `sentinel` since [0] would be created gain and again on `sentinel` being called. And I didn't manage to put it just above the `sentinel` function since it lead to the problem of scopes and `global` keyword doesn't seem to let solve it. – ovgolovin Sep 12 '11 at 21:27
  • @ovgo Look up [mutable default arguments](http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument). Function argument defaults are evaluated at definition time, so are the same every time the function is run. This only matters for mutable defaults. On Python 3, You could use the "nonlocal" keyword to put `counter` outside `sentinal`. – agf Sep 12 '11 at 21:56
  • Thanks for the answer. I've already created a special question: http://stackoverflow.com/questions/7394651/what-is-the-best-way-to-share-a-value-by-all-the-generators-created-by-a-function – ovgolovin Sep 12 '11 at 22:12