1

Here I asked a question about izip_longest function from itertools module.

The code of it:

def izip_longest_from_docs(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
        yield counter()         # yields the fillvalue, or raises IndexError
    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except IndexError:
        pass

There appeared to be an error in the documentation in the pure Python equivalent of that function. The error was that the real function did and the abovementioned equivalent didn't propagate IndexError exceptions that were raised inside the generators sent as the function parameters.

@agf solved the problem and gave a corrected version of the pure Python equivalent.

But at the same time when he was writing his solution I made my own. And while making it I faced one problem which I hope will be unraveled by asking this question.

The code that I came up with is this:

def izip_longest_modified_my(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')

    class LongestExhausted(Exception):
        pass

    def sentinel(fillvalue = fillvalue, counter = [0]):
        def ret():
            counter[0] += 1
            if counter[0] == len(args):
                raise LongestExhausted
            yield fillvalue
        return ret()

    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except LongestExhausted:
        pass 

In the original code sentinel is a generator which implements lazy evaluation. So that counter() is returned only when it's actually needed by the iterator created using chain function.

In my code I added a counter which holds a list of one value [0]. The reason for that was to put a mutable object into some place where it can be accessed by all the returned iterators ret() and changed by them. The only place I found suitable was in the function_defaults of sentinel.

If I put it inside the sentinel function, then the counter would be assigned to [0] on every call of sentinel and that would be different lists for all the ret()s:

def sentinel(fillvalue = fillvalue):
    counter = [0]
    def ret():
        counter[0] += 1
        if counter[0] == len(args):
            raise LongestExhausted
        yield fillvalue
    return ret()

I tried to put it outside of the sentinel function:

counter = 0
def sentinel(fillvalue = fillvalue):
    def ret():
        counter += 1
        if counter == len(args):
            raise LongestExhausted
        yield fillvalue
    return ret()

But the exception rose: UnboundLocalError: local variable 'counter' referenced before assignment.

I added global keyword, but it didn't help (I think because counter is really not in the global scope):

counter = 0
def sentinel(fillvalue = fillvalue):
    global counter
    def ret():
        counter += 1
        if counter == len(args):
            raise LongestExhausted
        yield fillvalue
    return ret()

So, my question is:

Is the approach that I used (to put mutable list counter = [0] to function_defaults) the best in this case, or there is some better way to solve this problem?

Community
  • 1
  • 1
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
  • 2
    This has been asked many times in many forms. Read any number of other questions about mutable default arguments and the new Python 3 `nonlocal` keyword. – agf Sep 12 '11 at 22:23
  • In your last code snippet, `counter` is local to `ret`, not to `sentinel`, so if you want to use the global (i.e. module-level) `counter`, you need to put the `global` declaration in `ret`. (I haven't really absorbed the rest of your question; I'm only responding to your `UnboundLocalError`.) – John Y Sep 12 '11 at 22:27

2 Answers2

1

This has been asked many times in many forms. Read any number of other questions about mutable default arguments and the new Python 3 nonlocal keyword. On Python 2, you could use a function attribute:

def sentinel(fillvalue = fillvalue):
    def ret():
        sentinel.counter += 1
        if sentinel.counter == len(args):
            raise LongestExhausted
        yield fillvalue
    return ret()
sentinel.counter = 0

or use global both inside ret and inside izip_longest so you're always referencing a global variable:

global counter
counter = 0
def sentinel(fillvalue = fillvalue):
    def ret():
        global counter
        counter += 1
        if counter == len(args):
            raise LongestExhausted
        yield fillvalue
    return ret()

However, using global restricts you to only one izip_longest at a time -- see the comments on the other answer.

You're also defining a new ret every time sentinel is called (once per iterator) -- you could instead do something like

global counter
counter = 0
arglen = len(args)

def ret():
    global counter
    counter += 1
    if counter == arglen:
        raise LongestExhausted
    return fillvalue

def sentinel():
    yield ret()

Example code for having sentinel outside izip_longest in re your question from the comments:

def sentinel(counter, arglen, fillvalue):
    def ret():
        counter[0] += 1
        if counter[0] == arglen:
            raise LongestExhausted
        yield fillvalue
    return ret()


def izip_longest_modified_my(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')

    class LongestExhausted(Exception):
        pass

    fillers = repeat(fillvalue)
    counter = [0]
    arglen = len(args)
    iters = [chain(it, sentinel(counter, arglen, fillvalue), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except LongestExhausted:
        pass

Here you're again using a list just as a container to get around the problems accessing outer scopes in Python 2.

Community
  • 1
  • 1
agf
  • 171,228
  • 44
  • 289
  • 238
  • Thanks. Placing `global` into both `ret` and `izip_longest` did work! – ovgolovin Sep 12 '11 at 22:38
  • And what about my approach where I put it into `function_defaults`? Is it OK to do it this way? – ovgolovin Sep 12 '11 at 22:39
  • Thanks about pointing out that `ret` should be placed outside of `sentinel`. I think it's a really good suggestion. – ovgolovin Sep 12 '11 at 22:43
  • @ovgo It's certainly "OK" in that it works, but using a mutable default argument just as a container isn't considered good coding style. If you're just using the default to simulate a function attribute (which you are) just use a real function attribute. – agf Sep 12 '11 at 22:43
  • By the way, in the pure Python equivalent the author used a mutable default argument as a container. So, it seems to be not a good stile of coding to put on docs. They should be careful in that :) because a lot of people use documentation to learn Python. So docs have to be immaculate not to disseminate errors and bad code practice. – ovgolovin Sep 12 '11 at 22:57
  • About `sentinel.counter`. Is it OK that in this case the name of the function (in this case `sentinel`) is scattered all over the code? As I know, it's better not to use direct names (e.g. in classes we use `super` to access the ancestor). Or I wrong in this? – ovgolovin Sep 12 '11 at 22:59
  • Thank you for the answer, by the way. Now I know 3 ways to do that: 1. to use `nonlocal` in Python3. 2. To use `sentinel.counter` (function `__dict__`). 3. to use `global` in every function that uses `counter` to put `counter` in the `global scope` for it to be shared by all the functions. – ovgolovin Sep 12 '11 at 23:02
  • One more question about p.3: Is it OK to bloat the global scope in this way. As I know, all the functions pass through all the dictionaries, including `global scope` dict, so the performance must be decreasing. – ovgolovin Sep 12 '11 at 23:03
  • In classes we use `super` (sometimes; other times it's overkill or introduces other problems) because we don't necessarily know what our direct super-class is (multiple inheritance). Other times we try not to couple things unnecessarily by naming them directly. Here, `sentinel` is a local `def` inside the main function, and the rest of the function is heavily dependent on the existence and implementation of `sentinel`. Sometimes avoiding directly naming the object you know you want to reference is just obfuscation. – Ben Sep 12 '11 at 23:04
  • @ovgo The original _didn't_ use a mutable default argument just as a container -- they actually used the fact that the argument was mutable as part of their implementation. Your #2: Ben is right about the function attribute method. Your #3: Your intuition about `global` is right -- it's not a good idea to use a `global` for this, and it is slower than a more local lookup. – agf Sep 13 '11 at 00:06
  • @agf Is it possible to put `sentinel` function outside of `izip_longes` function? If so, we somehow must ensure that the counter of the needed `izip_longest` is used. I think (but not sure) that if `sentinel` is called from `izip_longes` then `izip_longes.counter` called from `sentinel` would access the variable which belongs to `izip_longest` which has called `sentinel`. Am I correct in my thinking? – ovgolovin Sep 13 '11 at 14:12
  • @ovgo No. The reason the function attribute works on `sentinel` is that a new `sentinel` is created every time you run `izip_longest`. There is only one `izip_longest` so using a function attribute on it would be equivalent to using a global variable, as would using a function attribute on `sentinal` if you moved it outside of `izip_longest`. – agf Sep 13 '11 at 14:34
  • @agf OK. Then I have two question. If we have only one instance of `izip_longest`, how come the we can run a few `izip_longest` with different parameters simultaneously? (I only now started understanding that we call the same instance of function, but with different parameters; there should be some magic behind the scenes). The second question is: Can we place `sentinel` outside `izip_longest`, but pass `count` as a parameter to `sentinel` which is created with `counter = [0]` before calling `sentinel`? – ovgolovin Sep 13 '11 at 15:59
  • 1
    #1: Think of it sort of like a class vs. a class instance. The `def` line and any function attributes are like the class itself, while the body of the function is like the class instance. So any function bodies that exist in parallel refer to the same default parameters and the same function name / attributes. #2: Yes. Since you're never assigning to `counter` in `sentinel` (only to the first item of `counter`), it will automatically search outer scopes for the name, and will see the `counter` created in `izip_longest`. I'll add example code for this to my answer. – agf Sep 13 '11 at 19:22
  • @agf: Great! Thank you! I think it's possible to put `def ret():` outside of `sentinel` and send the parameters to it in the same manner, thus there will be fewer creations of `ret()` since `sentinel` is called once for each parameter of the `izip_longest`. – ovgolovin Sep 13 '11 at 21:29
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/3424/discussion-between-ovgolovin-and-agf) – ovgolovin Sep 13 '11 at 21:29
  • @ovgo Of course you _can_, but you're turning a self-contained function, the `izip_longest` in the docs or in my answer to your other post, into a mess of separate functions. Organization matters, style matters. Keep `sentinel` and `ret` inside `izip_longest` (but `ret` outside of `sentinel`) and use a function attribute on `sentinel` rather than a list as a container, if you want to use your method of tracking the number of iterators which have been exhausted. – agf Sep 13 '11 at 22:24
1

Using a global is a bad idea, IMHO. You need to make sure to reset the counter properly between calls. But more seriously this is a generator; you don't even need threading to have multiple calls to a generator in flight at the same time, which will wreck havoc with any attempt to sanely use a global to keep track of state.

You could just explicitly pass a reference to a mutable object into sentinel, and then into ret. It looks like your code controls all the calls to them. Function parameters are the original and boring way of transferring references between scopes!

Ben
  • 68,572
  • 20
  • 126
  • 174
  • Can you elaborate on 'resetting counters between calls' and 'not needing threading'? I don't get the ideas. – ovgolovin Sep 12 '11 at 23:06
  • What happens if I write `a = izip_longest(...); do_stuff_with(a); b = izip_longest(...); do_stuff_with_b()`? The global counter needs to be reset at the end of `izip_longest`, otherwise it won't start from zero when I call `izip_longest` to get b. But that doesn't fix the problem, because I could have also done `a = izip_longest(...); b = izip_longest(...); do_stuff_with_a_and_b()`. Here now you have two calls to the generator running at the same time, trying to use the *same* global variable as a counter. Plus of course, you could have two different threads that both call `izip_longest`. – Ben Sep 13 '11 at 00:06
  • @ovgo He means that by using a global, if you've got multiple `izip_longest`s going at the same time, it won't work right, i.e. `izip_longest(gen1(), izip_longest(gen1(), gen2())`, because they'll share the same counter. The `nonlocal`, default argument, or function attribute methods will all work correctly in this situation, because the counter will be local to the generator created by each call to `izip_longest`. – agf Sep 13 '11 at 00:10
  • 1
    Ben, make sure to ping the person you're trying to talk to with an `@` so they see it. – agf Sep 13 '11 at 00:11
  • @agf: Oh, I thought that was just a disambiguation convention. Does it actually influence when StackOverflow notifies you about replies? – Ben Sep 13 '11 at 00:36
  • The person who's post you're replying to (in this case, you) is automatically notified. You can notify one additional person involved in the conversation with `@name`. – agf Sep 13 '11 at 00:37
  • @agf: And that works even with an abbreviation, as when you wrote "ovgo" for "ovolovin"? Neat. Thanks for the tip! – Ben Sep 13 '11 at 01:23
  • It tries to match based on the first three letters. If it's nonunique (it could refer to multiple people in the conversation), it looks at the rest. – agf Sep 13 '11 at 01:34
  • @Ben @agf Thanks! I understood why `global` variables can not be used here. I think it's very important. – ovgolovin Sep 13 '11 at 13:17
  • @Ben Two more questions. What did you mean by 'your code controls all the calls to them'? And why 'function parameters are **boring** way of transferring references between scopes'? What is boring with it? – ovgolovin Sep 13 '11 at 21:38
  • @ogolovin I meant that `ret` and `sentinel` are used purely internally in the implementation of your generator. If they were returned for user code to call, then it might not be so convenient to just add an extra argument to them. – Ben Sep 13 '11 at 23:05
  • @ogolvin About the "boring" remark, I was just being flippant, sorry. I meant that since you were having trouble with declaring a counter in the right place that it would "magically" work as you needed it to, maybe you should switch back to a simpler and more direct technique where you directly state to the Python interpreter what you want. To me, adding default arguments that you never intend to use as arguments always seems like a roundabout technique that does the job accidentally as a side effect, rather than being a clear statement of what the programmer intended. – Ben Sep 13 '11 at 23:08