4

The following python code produces [(0, 0), (0, 7)...(0, 693)] instead of the expected list of tuples combining all of the multiples of 3 and multiples of 7:

multiples_of_3 = (i*3 for i in range(100))
multiples_of_7 = (i*7 for i in range(100))
list((i,j) for i in multiples_of_3 for j in multiples_of_7)

This code fixes the problem:

list((i,j) for i in (i*3 for i in range(100)) for j in (i*7 for i in range(100)))

Questions:

  1. The generator object seems to play the role of an iterator instead of providing an iterator object each time the generated list is to be enumerated. The later strategy seems to be adopted by .Net LINQ query objects. Is there an elegant way to get around this?
  2. How come the second piece of code works? Shall I understand that the generator's iterator is not reset after looping through all multiples of 7?
  3. Don't you think that this behavior is counter intuitive if not inconsistent?
Tarik
  • 10,810
  • 2
  • 26
  • 40
  • Read: [The Python yield keyword explained](http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained). – Bakuriu Aug 28 '13 at 11:34

4 Answers4

3

A generator object is an iterator, and therefore one-shot. It's not an iterable which can produce any number of independent iterators. This behavior is not something you can change with a switch somewhere, so any work around amounts to either using an iterable (e.g. a list) instead of an generator or repeatedly constructing generators.

The second snippet does the latter. It is by definition equivalent to the loops

for i in (i*3 for i in range(100)):
    for j in (i*7 for i in range(100)):
        ...

Hopefully it isn't surprising that here, the latter generator expression is evaluated anew on each iteration of the outer loop.

  • "A generator object is an iterator, and therefore one-shot. It's not an iterable which can produce any number of independent iterators." +1 for that. I kind of understood it upon asking my question. However, please, check my renewed question under the comment I just posted under user4815162342 response. – Tarik Aug 28 '13 at 12:05
  • @Tarik It's a bit subjective. I can see it being confusing coming from LINQ, but in my experience it's quite easy to avoid - I *know* that it works that way, but I can't recall having to write code differently because of this fact. I suppose it's easier if one *manually* advances the iterator. For example, consider `next(g); for x in g: ...` to skip the first item - this needs at least one extra line if `iter(g) is not g`. Note that all other iterators behave the same way, for better reasons, so changing generator expressions wouldn't fix this pitfall completely. –  Aug 28 '13 at 12:19
2

As you discovered, the object created by a generator expression is an iterator (more precisely a generator-iterator), designed to be consumed only once. If you need a resettable generator, simply create a real generator and use it in the loops:

def multiples_of_3():               # generator
    for i in range(100):
       yield i * 3
def multiples_of_7():               # generator
    for i in range(100):
       yield i * 7
list((i,j) for i in multiples_of_3() for j in multiples_of_7())

Your second code works because the expression list of the inner loop ((i*7 ...)) is evaluated on each pass of the outer loop. This results in creating a new generator-iterator each time around, which gives you the behavior you want, but at the expense of code clarity.

To understand what is going on, remember that there is no "resetting" of an iterator when the for loop iterates over it. (This is a feature; such a reset would break iterating over a large iterator in pieces, and it would be impossible for generators.) For example:

multiples_of_2 = iter(xrange(0, 100, 2))  # iterator
for i in multiples_of_2:
    print i
# prints nothing because the iterator is spent
for i in multiples_of_2:
    print i

...as opposed to this:

multiples_of_2 = xrange(0, 100, 2)        # iterable sequence, converted to iterator
for i in multiples_of_2:
    print i
# prints again because a new iterator gets created
for i in multiples_of_2:
    print i

A generator expression is equivalent to an invoked generator and can therefore only be iterated over once.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Don't you think that generators would be more useful if they where iterables instead of iterators? In fact this is what bothers me the most. I am coming back from experimenting with Haskell and LINQ which do lazy evaluations, thereby saving memory in the process. – Tarik Aug 28 '13 at 12:02
  • Generator functions *are* iterables in the general sense of the word - you convert them to an iterator by calling them. If by "generators" you're referring to *generator expressions*, they certainly could have been implemented as iterables whose `__iter__` calls the invisible underlying generator function and produces a new generator-iterator. But *that* would have been inconsistent with how normal generators work. (Generator expressions were introduced after the regular generator functions - the `def`s that contain a `yield` - were already a part of the language.) – user4815162342 Aug 28 '13 at 12:08
  • Since a generator is just syntactic sugar for what is in effect implemented by an underlying def, what would make it inconsistent if it was implemented using a def containing a yield. Look at lambda expressions: Aren't they implemented as def behind the scenes? Again, just syntactic sugar. The bottom line is that I still do not understand what we would lose by having generators implemented as iterables instead of iterators. – Tarik Aug 28 '13 at 12:26
  • @Tarik When you say "generator", do you refer to generator functions or to generator expressions? I assumed you meant specifically generator expressions - so in that case they would behave differently than named generators, hence the loss of consistency. Currently a generator expression is, as you say, syntactic sugar for an applied argument-less generator with a single `yield from`. Now, if you're referring to generators, sure - if generators had been implemented as iterables from the start, generator expressions would also be iterables, and everything would still be consistent. – user4815162342 Aug 28 '13 at 12:37
  • Thanks for your time and perseverance answering this question. I would love seeing your real name if not your photo on the site instead of userxxxx :-) Have a nice day! – Tarik Aug 28 '13 at 12:53
  • @Tarik You're welcome, and I hope you'll find Python fun to work with! I'll consider adding some personal info in the profile. :) – user4815162342 Aug 28 '13 at 13:07
  • Reading further into iterable and iterator, I found out that a generator expression has a __iter__() function defined, which makes it an iterable. That brings the question back on the table again. – Tarik Aug 29 '13 at 10:20
  • @Tarik Not really - *every* iterator has an `__iter__` that returns itself, otherwise they wouldn't be usable in contexts that do `iter(...)`, such as `for` loops. – user4815162342 Aug 29 '13 at 11:20
  • According to this link http://mail.python.org/pipermail/python-dev/2002-July/026287.html, there is no mechanism to determine if an iterable is single or multi pass. Both a list and a generator expression implement __iter__() and __next__() functions. The first function returns the iterator object that implements __next__. A single pass iterable returns itself as an iterator while a multipass iterable returns a distinct iterator each time __iter__ is called, that will independently keep track of the iteration state. The issue at hand is that iterables are not consistently implemented. – Tarik Aug 29 '13 at 12:21
  • At this point I find it hard to follow what it is you are arguing, exactly, and how "the issue at hand" pertains to your question or to my answer. While there's no *general* mechanism to determine whether an iterable is multipass or single-pass, an iterator whose `__iter__` returns itself is obviously single-pass, so you could use that to check, although again it's unclear how that would be useful for generator-iterators. For what it's worth, I see no inconsistency in iterators themselves being iterable. – user4815162342 Aug 29 '13 at 12:35
  • The inconsistency is in the fact that some out of the box iterables are implemented as single pass while others are implemented as multi-pass with no difference in the exposed interface that gives a hint to which kind it is implementing. A range is a multi-pass iterable while a generator expression is a single pass iterable. – Tarik Aug 29 '13 at 12:51
  • @Tarik As I said, the hint is when `__iter__` returns self. – user4815162342 Aug 29 '13 at 13:06
  • That could work but it's an implicit and IMHO dirty way of determining it. That also implies that unlike other environments such as .Net, this is something you have to wory about and check before using it. Neither the IDE nor the runtime would complain. If you overlook that check beforehand, you end up with possible logical error as I did. – Tarik Aug 29 '13 at 13:15
  • @Tarik Well, you were looking for a "hint". :) – user4815162342 Aug 29 '13 at 13:22
1

The real issue as I found out is about single versus multiple pass iterables and the fact that there is currently no standard mechanism to determine if an iterable single or multi pass: See Single- vs. Multi-pass iterability

Community
  • 1
  • 1
Tarik
  • 10,810
  • 2
  • 26
  • 40
1

If you want to convert a generator expression to a multipass iterable, then it can be done in a fairly routine fashion. For example:

class MultiPass(object):
    def __init__(self, initfunc):
        self.initfunc = initfunc
    def __iter__(self):
        return self.initfunc()

multiples_of_3 = MultiPass(lambda: (i*3 for i in range(20)))
multiples_of_7 = MultiPass(lambda: (i*7 for i in range(20)))
print list((i,j) for i in multiples_of_3 for j in multiples_of_7)

From the point of view of defining the thing it's a similar amount of work to typing:

def multiples_of_3():
    return (i*3 for i in range(20))

but from the point of view of the user, they write multiples_of_3 rather than multiples_of_3(), which means the object multiples_of_3 is polymorphic with any other iterable, such as a tuple or list.

The need to type lambda: is a bit inelegant, true. I don't suppose there would be any harm in introducing "iterable comprehensions" to the language, to give you what you want while maintaining backward compatibility. But there are only so many punctuation characters, and I doubt this would be considered worth one.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • Thanks for adding interesting points to this discussion. Although I had this solution in mind, I still believe that iterables should consistently be multipass while iterators should be single pass, regardless how they are created (lambda expressions, generator functions, generator expression...) – Tarik Sep 07 '13 at 19:39
  • @Tarik: that's not really possible, since you want to be able to supply in place of an iterable, something that inherently is single-pass (for example, a file object reading from stdin or a socket). That's why iterators are themselves iterable (in some cases just returning `self`), to support that polymorphism. – Steve Jessop Sep 07 '13 at 20:07