How to prevent iterator getting exhausted?

Question

If I create two lists and zip them

a=[1,2,3]
b=[7,8,9]
z=zip(a,b)

Then I typecast z into two lists

l1=list(z)
l2=list(z)

Then the contents of l1 turn out to be fine [(1,7),(2,8),(3,9)], but the contents of l2 is just [].

I guess this is the general behavior of python with regards to iterables. But as a novice programmer migrating from the C family, this doesn't make sense to me. Why does it behave in such a way? And is there a way to get past this problem?

I mean, yeah in this particular example, I can just copy l1 into l2, but in general is there a way to 'reset' whatever Python uses to iterate 'z' after I iterate it once?

It's behaviour of *generators*, not all iterables. Lists, for example, are iterables, and you can call `list(a)` and get copies of `a` as much as you want. — Karl Knechtel, Jun 03 '12 at 01:55

senderle · Accepted Answer · 2012-06-03T13:30:14.400

There's no way to "reset" a generator. However, you can use itertools.tee to "copy" an iterator.

>>> z = zip(a, b)
>>> zip1, zip2 = itertools.tee(z)
>>> list(zip1)
[(1, 7), (2, 8), (3, 9)]
>>> list(zip2)
[(1, 7), (2, 8), (3, 9)]

This involves caching values, so it only makes sense if you're iterating through both iterables at about the same rate. (In other words, don't use it the way I have here!)

Another approach is to pass around the generator function, and call it whenever you want to iterate it.

def gen(x):
    for i in range(x):
        yield i ** 2

def make_two_lists(gen):
    return list(gen()), list(gen())

But now you have to bind the arguments to the generator function when you pass it. You can use lambda for that, but a lot of people find lambda ugly. (Not me though! YMMV.)

>>> make_two_lists(lambda: gen(10))
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [0, 1, 4, 9, 16, 25, 36, 49, 64, 81])

I hope it goes without saying that under most circumstances, it's better just to make a list and copy it.

Also, as a more general way of explaining this behavior, consider this. The point of a generator is to produce a series of values, while maintaining some state between iterations. Now, at times, instead of simply iterating over a generator, you might want to do something like this:

z = zip(a, b)
while some_condition():
    fst = next(z, None)
    snd = next(z, None)
    do_some_things(fst, snd)
    if fst is None and snd is None:
        do_some_other_things()

Let's say this loop may or may not exhaust z. Now we have a generator in an indeterminate state! So it's important, at this point, that the behavior of a generator is restrained in a well-defined way. Although we don't know where the generator is in its output, we know that a) all subsequent accesses will produce later values in the series, and b) once it's "empty", we've gotten all the items in the series exactly once. The more ability we have to manipulate the state of z, the harder it is to reason about it, so it's best that we avoid situations that break those two promises.

Of course, as Joel Cornett points out below, it is possible to write a generator that accepts messages via the send method; and it would be possible to write a generator that could be reset using send. But note that in that case, all we can do is send a message. We can't directly manipulate the generator's state, and so all changes to the state of the generator are well-defined (by the generator itself -- assuming it was written correctly!). send is really for implementing coroutines, so I wouldn't use it for this purpose. Everyday generators almost never do anything with values sent to them -- I think for the very reasons I give above.

This works, but is complex and overkill, IMO. The "Don't use it like I do" is a hint of that as well. :-) — Lennart Regebro, Jun 03 '12 at 07:14
@LennartRegebro, well, I think `tee` exists for a good reason, and it's the closest thing in the standard libs that I can think of to the functionality the OP has requested. I assume that the OP already knows it's possible to copy the list! — senderle, Jun 03 '12 at 13:04
@user1265125, consider my recent edit, which answers your question in more detail. — senderle, Jun 03 '12 at 13:33

score 4 · Answer 2 · answered Jun 03 '12 at 07:13

4

If you need two copies of the list, which you do if you need to modify them, then I suggest you make the list once, and then copy it:

a=[1,2,3]
b=[7,8,9]
l1 = list(zip(a,b))
l2 = l1[:]

answered Jun 03 '12 at 07:13

Lennart Regebro

167,292
41
224
251

Yeah as I mentioned, I *can* simply copy the first list. I asked this question just because I wanted to be clear in my Python concepts. Thanks anyway! – user1265125 Jun 03 '12 at 18:51

Thomas Orozco · Answer 3 · 2012-06-03T11:07:44.297

2

Just create a list out of your iterator using list() once, and use it afterwards.

It just happens that zip returns a generator, which is an iterator that you can only iterate once.

You can iterate a list as many times as you want.

edited Jun 03 '12 at 11:07

answered Jun 02 '12 at 21:39

Thomas Orozco

53,284
11
113
116

2

It's not really "casting", but this is usually the best approach. Also, you've correctly identified that 'iterator' is a super-category including generators and sequences, so this gets my vote for the best answer. – Karl Knechtel Jun 03 '12 at 01:56
10+ years later: on the other hand, the iterator created by `zip` isn't actually a generator (at least nowadays), but an instance of a class that implements the iterator protocol. – Karl Knechtel Jan 07 '23 at 06:58

Levon · Answer 4 · 2012-06-03T02:02:45.343

1

No, there is no way to "reset them".

Generators generate their output once, one by one, on demand, and then are done when the output is exhausted.

Think of them like reading a file, once you are through, you'll have to restart if you want to have another go at the data.

If you need to keep the generator's output around, then consider storing it, for instance, in a list, and subsequently re-use it as often as you need. (Somewhat similar to the decisions that guided the use of xrange(), a generator vs range() which created a whole list of items in memory in v2)

Updated: corrected terminology, temporary brain-outage ...

edited Jun 03 '12 at 02:02

answered Jun 02 '12 at 21:38

Levon

138,105
33
200
191

What you describe is basically done by `itertools.tee()`, as described in senderle's answer. +1 from me, though, as your discussion is relevant. – Eric O. Lebigot Apr 19 '13 at 08:32

score 0 · Answer 5 · edited May 23 '17 at 11:46

Yet another explanation. As a programmer, you probably understand the difference between classes vs. instances (i.e. objects). The zip() is said to be a built-in function (in the official doc). Actually, it is a built-in generator function. It means it is rather the class. You can even try in the interactive mode:

>>> zip
<class 'zip'>

The classes are types. Because of that also the following should be clear:

>>> type(zip)
<class 'type'>

Your z is the instance of the class, and you can think about calling the zip() as about calling the class constructor:

>>> a = [1, 2, 3]
>>> b = [7, 8, 9]
>>> z = zip(a, b)
>>> z
<zip object at 0x0000000002342AC8>
>>> type(z)
<class 'zip'>

The z is an iterator (object) that keeps inside the iterators for the a and b. Because of its generic implementation, the z (or the zip class) has no mean to reset the iterators through the a or b or whatever sequences. Because of that there is no way to reset the z. The cleanest way to solve your concrete problem is to copy the list (as you mentioned in the question and Lennart Regebro suggests). Another understandable way is to use the zip(a, b) twice, thus constructing the two z-like iterators that behaves from the start the same way:

>>> lst1 = list(zip(a, b))
>>> lst2 = list(zip(a, b))

However, this cannot be used generally with the identical result. Think about a or b being unique sequences generated based on some current conditions (say temperatures read from several thermometers).

How to prevent iterator getting exhausted?

5 Answers5

Linked

Related