103

I can't figure out how to look ahead one element in a Python generator. As soon as I look it's gone.

Here is what I mean:

gen = iter([1,2,3])
next_value = gen.next()  # okay, I looked forward and see that next_value = 1
# but now:
list(gen)  # is [2, 3]  -- the first value is gone!

Here is a more real example:

gen = element_generator()
if gen.next_value() == 'STOP':
  quit_application()
else:
  process(gen.next())

Can anyone help me write a generator that you can look one element forward?


See also: Resetting generator object in Python

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
bodacydo
  • 75,521
  • 93
  • 229
  • 319
  • 1
    Can you describe in more detail what you want to do? Code sample perhaps? – Tim Pietzcker Mar 11 '10 at 13:41
  • if you have an existing list, what else do you need? Also, it seems that you saving the first value as `next_value`, no? – SilentGhost Mar 11 '10 at 13:43
  • SilentGhost, it was an example to illustrate what `gone` means. I don't have a list and I don't have next_value. It was just an example to show what it means for an element to disappear from a generator. – bodacydo Mar 11 '10 at 13:44
  • @bodacydo: I still don't understand. **How** is it gone then? Why don't you have access to that value? – SilentGhost Mar 11 '10 at 13:47
  • Tim, updated the question with a better example. – bodacydo Mar 11 '10 at 13:47
  • SilentGhost, I didn't look into generator - I took the value out of it, looked at it, but didn't put back. So it's gone from generator. I just want to look in it. See updated post. :) – bodacydo Mar 11 '10 at 13:48
  • I still don't see what that's supposed to be good for. You want to avoid calling `gen.next()` if the generator is "exhausted". That's what the exception `StopIteration` is for. What is in the generator after this element "STOP"? Why isn't the generator simply exhausted at this point? Or, if that's not possible, why not have the generator `raise StopIteration` if the current element is "Stop"? – Tim Pietzcker Mar 11 '10 at 14:03
  • -1: "I don't have a list and I don't have next_value." "I don't know". It's very hard to answer a question like this. – S.Lott Mar 11 '10 at 15:10
  • I've come around to thinking that the correct approach is to modify your algorithm to use the 'current' and 'previous' value from the generator, rather than trying to use the 'next' and 'current'. I don't believe there is any algorithm that can't be recast this way, which is much simpler than any of the solutions offered (including both of mine.) – Jonathan Hartley Sep 03 '12 at 17:52
  • I consider it a limitation of Python if it forces you to change a perfectly reasonable algorithm in order for it to work. But hey, that falls out of the philosophy of the language itself. – Steven Lu Jun 05 '13 at 04:24
  • @StevenLu That's crazy! It's not the language, it's the OP's bad idea of having two different readers of an unseekable stream, then compensating for this bad design by trying to make them not tread on each other's toes. Have a single reader from the generator, rather than two, and all these problems melt away into zero required code. – Jonathan Hartley Apr 25 '14 at 09:39

18 Answers18

103

For sake of completeness, the more-itertools package (which should probably be part of any Python programmer's toolbox) includes a peekable wrapper that implements this behavior. As the code example in the documentation shows:

>>> p = peekable(['a', 'b'])
>>> p.peek()
'a'
>>> next(p)
'a'

However, it's often possible to rewrite code that would use this functionality so that it doesn't actually need it. For example, your realistic code sample from the question could be written like this:

gen = element_generator()
command = gen.next_value()
if command == 'STOP':
  quit_application()
else:
  process(command)

(reader's note: I've preserved the syntax in the example from the question as of when I'm writing this, even though it refers to an outdated version of Python)

David Z
  • 128,184
  • 27
  • 255
  • 279
77

The Python generator API is one way: You can't push back elements you've read. But you can create a new iterator using the itertools module and prepend the element:

import itertools

gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • 9
    You can use [`send`](http://docs.python.org/2/reference/expressions.html#generator) to push a previously yielded value back into a generator as it yields the next value. – dansalmo Dec 26 '13 at 03:56
  • 3
    @dansalmo: Yes, but you need to modify the generator code for this. See the answer by Andrew Hare. – Aaron Digulla Jan 08 '14 at 09:23
  • 10
    I've used this solution many times, but I think it probably should be pointed out that you basically call `itertools.chain.__next__` `n` times for each element that you get out of the iterable (where `n` is the number of times you've peeked). This works great for one or two peeks, but if you need to peek at every element, this isn't the best solution :-) – mgilson Mar 10 '16 at 18:46
  • 10
    I'd mention that this is implemented in the `more-itertools` package as [`spy`](https://more-itertools.readthedocs.io/en/latest/api.html#more_itertools.spy). Not to say it's worth bringing in a whole new package for just this one piece of functionality, but some people may find an existing implementation useful. – David Z Mar 22 '17 at 22:20
  • 3
    @mgilson Yeah, this should definitely come with a warning. People might very well try to do this in a loop, peeking at each element, and then the whole iteration takes quadratic time. – Kelly Bundy Jan 24 '20 at 20:24
  • @KellyBundy Why would it be quadratic time to peek at each element? `itertools.chain` runs in constant time, no? – Eric Auld May 03 '22 at 12:24
  • 1
    @EricAuld If you want to use this, i.e., use `chain` to prepend a peeked value back to the iterable, you'd build a stack of chains and every value would go through the whole stack. [Demo](https://tio.run/##VVBBjsIwDLznFT6mUg@wXFClPmDPuze0WgVwqUXjRMbVwutLkkJh5xJ5xh5PHG/aB95so0xTJ8GDkkdSIB@D6KMyRSFF0RCGy1M89I7YmCN2EBHPF8tVYyAhzbel3YrjEya@KvxfTwPCt4w492Wo3F5FRrZK44xXtaTVouH1gFHhS0P8TNZOKfD/SUEdhReqpCgZ7S6b/tTw7rcsMF0Q@AVimNNuHr/INGd6vUqo4SM/b8GT@3weOzi/P7pmuUINPPo9Srt@rYtCrJZr0Gqa7g) where doubling the length indeed quadruples the time. – Kelly Bundy May 03 '22 at 12:41
  • You may want to wrap that with a try-except StopIteration to cover the case where the iterator is empty - otherwise this will raise – Philippe Hebert Oct 29 '22 at 15:25
24

Ok - two years too late - but I came across this question, and did not find any of the answers to my satisfaction. Came up with this meta generator:

class Peekorator(object):

    def __init__(self, generator):
        self.empty = False
        self.peek = None
        self.generator = generator
        try:
            self.peek = self.generator.next()
        except StopIteration:
            self.empty = True

    def __iter__(self):
        return self

    def next(self):
        """
        Return the self.peek element, or raise StopIteration
        if empty
        """
        if self.empty:
            raise StopIteration()
        to_return = self.peek
        try:
            self.peek = self.generator.next()
        except StopIteration:
            self.peek = None
            self.empty = True
        return to_return

def simple_iterator():
    for x in range(10):
        yield x*3

pkr = Peekorator(simple_iterator())
for i in pkr:
    print i, pkr.peek, pkr.empty

results in:

0 3 False
3 6 False
6 9 False
9 12 False    
...
24 27 False
27 None False

i.e. you have at any moment during iteration access to the next item in the list.

plof
  • 1,294
  • 1
  • 9
  • 7
  • 5
    I feel a bit mean saying this but I find this solution horrendous & quite error-prone. At any moment in time, you need access to two items from the generator: the 'i' and 'i+1' elements. Why not code your algorithm to use the current and the previous value, instead of the next and the current value? It seems absolutely identical, and much simpler than this. – Jonathan Hartley Sep 03 '12 at 17:49
  • 1
    by all means - be as mean as you need to :) – plof Oct 23 '12 at 20:39
  • 8
    @Jonathan this may not always be possible in non-trivial examples, eg when the iterator gets passed into a function. – Florian Ledermann Feb 06 '13 at 13:42
  • 4
    Someone should point out that from python2.6 onward, the preferred way to get the next value of a generator is `next(generator)` rather than `generator.next()`. IIRC, `generator.next()` goes away in python3.x. Similarly, for best forward compatibility, add `__next__ = next` into the body of the class so that it continues to work in python3.x. That said, great answer. – mgilson Mar 10 '16 at 18:42
  • Echoing @mgilson, this doesn't work in Python 3 if the generator is a string iterator. For that you absolutely need to use `next()` – jpyams Feb 16 '18 at 15:35
20

Using itertools.tee will produce a lightweight copy of the generator; then peeking ahead at one copy will not affect the second copy. Thus:

import itertools

def process(seq):
    peeker, items = itertools.tee(seq)
    
    # initial peek ahead
    # so that peeker is one ahead of items
    if next(peeker) == 'STOP':
        return
    
    for item in items:
    
        # peek ahead
        if next(peeker) == "STOP":
            return
    
        # process items
        print(item)

The items generator is unaffected by modifications to peeker. However, modifying seq after the call to tee may cause problems.

That said: any algorithm that requires looking an item ahead in a generator could instead be written to use the current generator item and the previous item. This will result in simpler code - see my other answer to this question.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Jonathan Hartley
  • 15,462
  • 9
  • 79
  • 80
  • 4
    "Any algorithm that requires you to look 1 item ahead in a generator could alternatively be written to use the current generator item and the previous item." Mangling your use of generators can sometimes lead to more elegant and readable code, especially in parsers that require lookahead. – Rufflewind Feb 04 '16 at 21:51
  • Hey there Rufflewind. I understand the point about parsing requiring lookahead, but I don't see why you can't achieve that by simply storing the previous item out of your generator, and using the most recent item out of your generator as the lookahead. Then you get the best of both worlds: unmangled generator, and simple parser. – Jonathan Hartley Feb 05 '16 at 01:55
  • Well, that's why you wrap the generator in a custom class to automatically do this. – Rufflewind Feb 05 '16 at 03:37
  • Hey Ruffelwind. I'm no longer sure that I understand what you're advocating. Sorry to have lost the plot. – Jonathan Hartley Feb 05 '16 at 20:37
  • This is bad idea - the existance of `copy1` causes you to buffer the entire iterator. – Eric May 11 '18 at 18:41
  • @Eric Hey Eric. you're right insomuch as I wrote the demo code wrong - my intention was that the call to 'copy1.next()' would happen within the loop that processes copy2, hence both copies move through the sequence together, so only one item is buffered at any time. I'll fix the code, thanks for pointing out it was wrong. – Jonathan Hartley May 12 '18 at 14:58
  • 1
    FWIW, code is now fixed, @Eric\ May's comment that the whole iterator is buffered is no longer true. – Jonathan Hartley May 15 '18 at 12:31
10

An iterator that allows peeking at the next element and also further ahead. It reads ahead as needed and remembers the values in a deque.

from collections import deque

class PeekIterator:

    def __init__(self, iterable):
        self.iterator = iter(iterable)
        self.peeked = deque()

    def __iter__(self):
        return self

    def __next__(self):
        if self.peeked:
            return self.peeked.popleft()
        return next(self.iterator)

    def peek(self, ahead=0):
        while len(self.peeked) <= ahead:
            self.peeked.append(next(self.iterator))
        return self.peeked[ahead]

Demo:

>>> it = PeekIterator(range(10))
>>> it.peek()
0
>>> it.peek(5)
5
>>> it.peek(13)
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    it.peek(13)
  File "[...]", line 15, in peek
    self.peeked.append(next(self.iterator))
StopIteration
>>> it.peek(2)
2
>>> next(it)
0
>>> it.peek(2)
3
>>> list(it)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
7

A simple solution is to use a function like this:

def peek(it):
    first = next(it)
    return first, itertools.chain([first], it)

Then you can do:

>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1
Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
6
>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Rusty Rob
  • 16,489
  • 8
  • 100
  • 116
  • do you mind providing an explanation about what is happening here – Kristof Pal Feb 21 '14 at 20:08
  • We take peek off gen. We then create an iterable [peek] and combine it with the rest of gen to create a new gen. This is done by iterating through the flattening of the two generators which combine to give the original. See flatting: http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python – Rusty Rob Feb 22 '14 at 00:22
  • 1
    This is the same, but more explicit than the itertools.chain solution. – Theo Belaire Apr 24 '14 at 19:03
6

Just for fun, I created an implementation of a lookahead class based on the suggestion by Aaron:

import itertools

class lookahead_chain(object):
    def __init__(self, it):
        self._it = iter(it)

    def __iter__(self):
        return self

    def next(self):
        return next(self._it)

    def peek(self, default=None, _chain=itertools.chain):
        it = self._it
        try:
            v = self._it.next()
            self._it = _chain((v,), it)
            return v
        except StopIteration:
            return default

lookahead = lookahead_chain

With this, the following will work:

>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]

With this implementation it is a bad idea to call peek many times in a row...

While looking at the CPython source code I just found a better way which is both shorter and more efficient:

class lookahead_tee(object):
    def __init__(self, it):
        self._it, = itertools.tee(it, 1)

    def __iter__(self):
        return self._it

    def peek(self, default=None):
        try:
            return self._it.__copy__().next()
        except StopIteration:
            return default

lookahead = lookahead_tee

Usage is the same as above but you won't pay a price here to use peek many times in a row. With a few more lines you can also look ahead more than one item in the iterator (up to available RAM).

Bluehorn
  • 2,956
  • 2
  • 22
  • 29
4

If anybody is interested, and please correct me if I am wrong, but I believe it is pretty easy to add some push back functionality to any iterator.

class Back_pushable_iterator:
    """Class whose constructor takes an iterator as its only parameter, and
    returns an iterator that behaves in the same way, with added push back
    functionality.

    The idea is to be able to push back elements that need to be retrieved once
    more with the iterator semantics. This is particularly useful to implement
    LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
    really a matter of perspective. The pushing back strategy allows a clean
    parser implementation based on recursive parser functions.

    The invoker of this class takes care of storing the elements that should be
    pushed back. A consequence of this is that any elements can be "pushed
    back", even elements that have never been retrieved from the iterator.
    The elements that are pushed back are then retrieved through the iterator
    interface in a LIFO-manner (as should logically be expected).

    This class works for any iterator but is especially meaningful for a
    generator iterator, which offers no obvious push back ability.

    In the LL(k) case mentioned above, the tokenizer can be implemented by a
    standard generator function (clean and simple), that is completed by this
    class for the needs of the actual parser.
    """
    def __init__(self, iterator):
        self.iterator = iterator
        self.pushed_back = []

    def __iter__(self):
        return self

    def __next__(self):
        if self.pushed_back:
            return self.pushed_back.pop()
        else:
            return next(self.iterator)

    def push_back(self, element):
        self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))

x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)

for x in it:
    print(x) # 4-9
Eric
  • 95,302
  • 53
  • 242
  • 374
nilo
  • 818
  • 8
  • 20
3

Instead of using items (i, i+1), where 'i' is the current item and i+1 is the 'peek ahead' version, you should be using (i-1, i), where 'i-1' is the previous version from the generator.

Tweaking your algorithm this way will produce something that is identical to what you currently have, apart from the extra needless complexity of trying to 'peek ahead'.

Peeking ahead is a mistake, and you should not be doing it.

Jonathan Hartley
  • 15,462
  • 9
  • 79
  • 80
  • 1
    You need to take an item out of a generator before you know if you want it. Say a function takes an item from a generator, upon inspection decides it doesn't want it. The next user of the generator won't see that item unless you can push it back. Peeking removes to need to push items back. – Isaac Turner Nov 27 '15 at 12:47
  • @IsaacTurner No, you don't need to do that. For example, you could have two nested generators. The inner one takes an item, decides it doesn't want to do anything with it, then yields it regardless. The outer one still sees everything in the sequence. There are equivalent, very simple, ways to do the same thing without nested generators. Just remember the 'previous item' in a variable and you can do anything that is requested by this question. MUCH simpler than trying to push things back. – Jonathan Hartley Nov 28 '15 at 04:48
3

This will work -- it buffers an item and calls a function with each item and the next item in the sequence.

Your requirements are murky on what happens at the end of the sequence. What does "look ahead" mean when you're at the last one?

def process_with_lookahead( iterable, aFunction ):
    prev= iterable.next()
    for item in iterable:
        aFunction( prev, item )
        prev= item
    aFunction( item, None )

def someLookaheadFunction( item, next_item ):
    print item, next_item
S.Lott
  • 384,516
  • 81
  • 508
  • 779
1

Although itertools.chain() is the natural tool for the job here, beware of loops like this:

for elem in gen:
    ...
    peek = next(gen)
    gen = itertools.chain([peek], gen)

...Because this will consume a linearly growing amount of memory, and eventually grind to a halt. (This code essentially seems to create a linked list, one node per chain() call.) I know this not because I inspected the libs but because this just resulted in a major slowdown of my program - getting rid of the gen = itertools.chain([peek], gen) line sped it up again. (Python 3.3)

Jacob Eliosoff
  • 119
  • 1
  • 2
1

Python3 snippet for @jonathan-hartley answer:

def peek(iterator, eoi=None):
    iterator = iter(iterator)

    try:
        prev = next(iterator)
    except StopIteration:
        return iterator

    for elm in iterator:
        yield prev, elm
        prev = elm

    yield prev, eoi


for curr, nxt in peek(range(10)):
    print((curr, nxt))

# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)

It'd be straightforward to create a class that does this on __iter__ and yields just the prev item and put the elm in some attribute.

nitely
  • 2,208
  • 1
  • 22
  • 23
  • An issue with this approach is that you fetch the next element one step ahead of time, which might be undesirable. For instance if fetching an element is slow or has side effects. – Maëlan Jun 03 '22 at 13:49
  • @Maëlan that's going to be an issue regardless. There's no way to find out what an element's value is, without in some sense fetching it. – Karl Knechtel Jan 09 '23 at 05:19
  • @KarlKnechtel there are situations where you decide to stop the iteration based on the value of the last fetched element, or on other context. You don’t need and don’t want to fetch another one. – Maëlan Jan 09 '23 at 10:04
  • In that case, write the corresponding logic so that it occurs in the previous loop iteration? Though in some special cases that might require special-casing to check the first element before starting a loop... – Karl Knechtel Jan 09 '23 at 22:28
1

w.r.t @David Z's post, the newer seekable tool can reset a wrapped iterator to a prior position.

>>> s = mit.seekable(range(3))
>>> s.next()
# 0

>>> s.seek(0)                                              # reset iterator
>>> s.next()
# 0

>>> s.next()
# 1

>>> s.seek(1)
>>> s.next()
# 1

>>> next(s)
# 2
pylang
  • 40,867
  • 14
  • 129
  • 121
1

cytoolz has a peek function.

>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]
W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111
1

In my case, I need a generator where I could queue back to generator the data I have just got via next() call.

The way I handle this problem, is to create a queue. In the implementation of the generator, I would first check the queue: if queue is not empty, the "yield" will return the values in queue, or otherwise the values in normal way.

import queue


def gen1(n, q):
    i = 0
    while True:
        if not q.empty():
            yield q.get()
        else:
            yield i
            i = i + 1
            if i >= n:
                if not q.empty():
                    yield q.get()
                break


q = queue.Queue()

f = gen1(2, q)

i = next(f)
print(i)
i = next(f)
print(i)
q.put(i) # put back the value I have just got for following 'next' call
i = next(f)
print(i)

running

python3 gen_test.py
0
1
1

This concept is very useful when I was writing a parser, which needs to look the file line by line, if the line appears to belong to next phase of parsing, I could just queue back to the generator so that the next phase of code could parse it correctly without handling complex state.

pandy.song
  • 51
  • 3
0

For those of you who embrace frugality and one-liners, I present to you a one-liner that allows one to look ahead in an iterable (this only works in Python 3.8 and above):

>>> import itertools as it
>>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable))[0], [None] * n), it.chain([None] * n, t[1])), n, None)
>>> for lookahead, element in peek(range(10)):
...     print(lookahead, element)
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
None 9
>>> for lookahead, element in peek(range(10), 2):
...     print(lookahead, element)
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
None 8
None 9

This method is space-efficient by avoiding copying the iterator multiple times. It is also fast due to how it lazily generates elements. Finally, as a cherry on top, you can look ahead an arbitrary number of elements.

Pradhyum R
  • 113
  • 2
  • 12
0

An algorithm that works by "peeking" at the next element in a generator could equivalently be one that works by remembering the previous element, treating that element as the one to operate upon, and treating the "current" element as simply "peeked at".

Either way, what is really happening is that the algorithm considers overlapping pairs from the generator. The itertools.tee recipe will work fine - and it is not hard to see that it is essentially a refactored version of Jonathan Hartley's approach:

from itertools import tee
# From https://docs.python.org/3/library/itertools.html#itertools.pairwise
# In 3.10 and up, this is directly supplied by the `itertools` module.
def pairwise(iterable):
    # pairwise('ABCDEFG') --> AB BC CD DE EF FG
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

def process(seq):
    for to_process, lookahead in pairwise(seq):
        # peek ahead
        if lookahead == "STOP":
            return
        # process items
        print(to_process)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153