218

I have a generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse the generator several times.

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

Of course, I'm taking in mind copying content into simple list. Is there a way to reset my generator?


See also: How to look ahead one element (peek) in a Python generator?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Dewfy
  • 23,277
  • 13
  • 73
  • 121
  • This is a special case of [How can I iterate twice over the same data with a given iterator?](https://stackoverflow.com/questions/25336726/how-can-i-iterate-twice-over-the-same-data-with-a-given-iterator), but there seem to be *some* generator-specific ways to address the problem. – Karl Knechtel Jan 07 '23 at 06:01

18 Answers18

196

Generators can't be rewound. You have the following options:

  1. Run the generator function again, restarting the generation:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
    
  2. Store the generator results in a data structure on memory or disk which you can iterate over again:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)
    

The downside of option 1 is that it computes the values again. If that's CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.

So you have the classic memory vs. processing tradeoff. I can't imagine a way of rewinding the generator without either storing the values or calculating them again.

You could also use tee as suggested by other answers, however that would still store the entire list in memory in your case, so it would be the same results and similar performance to option 2.

nosklo
  • 217,122
  • 57
  • 293
  • 297
  • May be exists a way to save signature of function call? FunctionWithYield, param1, param2... – Dewfy Aug 13 '09 at 11:29
  • 4
    @Dewfy: sure: def call_my_func(): return FunctionWithYield(param1, param2) – nosklo Aug 13 '09 at 12:44
  • @Dewfy What do you mean by "save signature of function call"? Could you please explain? Do you mean saving the parameters passed to the generator? – Андрей Беньковский Dec 29 '15 at 21:32
  • @АндрейБеньковский - See the answer of **nosklo** – Dewfy Dec 30 '15 at 10:22
  • 2
    Another downside of (1) is also that FunctionWithYield() can be not only costly, but *impossible* to re-calculate, e.g. if it is reading from stdin. – Max Jan 24 '19 at 22:16
  • 2
    To echo what @Max said, if the function's output might (or will) change between calls, (1) may give unexpected and/or undesirable results. – Sam_Butler Apr 09 '19 at 13:35
  • I suspect they don't just hand you an easy way to reset any iterator for these very reasons. People will expect the same object to have the same behavior if the code doesn't explicitly indicate any changes (ie different or mutable input), so a reset is only proper if the iterator's behavior won't change no matter what (ie straightforward output of Fibonacci sequence). If an iterative algorithm's output is so large that you hesitate to store it, then you also don't want to waste time re-doing it just to find the i-th element; the only better alternative is to find a non-iterative algorithm. – BatWannaBe Apr 12 '19 at 23:05
  • Option 3: save the result to a file. You only iterate once, and you do not blow up working memory. Downside is disk space and the not-always-trivial cost of ensuring the output is properly delimited/escaped so as to be machine readable. – spioter Apr 18 '21 at 16:59
  • Using `tee` is a special case of storing the results - it just only stores the results that have been reached thus far in the iteration, rather than iterating over and storing the entire output eagerly. – Karl Knechtel Jan 08 '23 at 22:40
  • @spioter this will generally happen anyway if you "blow up working memory", via virtual memory. It will be saved to a *swap* file, as raw data (with all the same pointers etc.) that doesn't require any user-defined formatting. – Karl Knechtel Jan 08 '23 at 22:41
163

Another option is to use the itertools.tee() function to create a second version of your generator:

import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

This could be beneficial from memory usage point of view if the original iteration might not process all the items.

StevenWernerCS
  • 839
  • 9
  • 15
Ants Aasma
  • 53,288
  • 15
  • 90
  • 97
  • 42
    If you're wondering about what it will do in this case, it's essentially caching elements in the list. So you might as well use `y = list(y)` with the rest of your code unchanged. – ilya n. Aug 13 '09 at 12:35
  • 5
    tee() will create a list internally to store the data, so that's the same as I did in my answer. – nosklo Aug 13 '09 at 12:43
  • 8
    Look at implmentation(http://docs.python.org/library/itertools.html#itertools.tee) - this uses lazy load strategy, so items to list copied only on demand – Dewfy Aug 13 '09 at 13:23
  • 12
    @Dewfy: Which will be **slower** since all items will have to be copied anyway. – nosklo Aug 13 '09 at 17:28
  • 10
    yes, list() is better in this case. tee is only useful if you are not consuming the entire list – gravitation Aug 09 '10 at 18:22
  • 6
    `tee()` is not my cup of tee. Why don't transform `y` into a function: `y = lambda: FunctionWithYield()`, and then `for x in y():` – jeromerg Aug 02 '17 at 18:12
  • 3
    @jeromerg this has to be pythonic zen: `gl = lambda: ((x,y) for x in range(10) for y in range(x))` Then `list(gl())` as many times as you like. +1, ser. – fbicknel Oct 05 '18 at 21:27
  • 3
    `tee` works with infinite iterables. `list` doesn't – Clément Feb 24 '22 at 19:58
43
>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2
fbicknel
  • 1,219
  • 13
  • 21
aaab
  • 431
  • 4
  • 2
  • 2
    It has 2 drawbacks: 1) You can't exhaust until StopIteration and 2) it doesn't wotk with any generator (for instance range) – Eric Sep 02 '20 at 16:49
33

Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

This way, you can cache the expensive calculations.

If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
32

I want to offer a different solution to an old problem

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)
michaelsnowden
  • 6,031
  • 2
  • 38
  • 83
  • @Dewfy In the first snippet, the generator is on the line "squares = ...". Generator expressions behave the same way as calling a function that uses yield, and I only used one because it's less verbose than writing a function with yield for such a short example. In the second snippet, I've used FunctionWithYield as the generator_factory, so it will be called whenever __iter__ is called, which is whenever I write "for x in y". – michaelsnowden Sep 19 '16 at 14:23
  • Good solution. This actually makes a stateless iterable object instead of a stateful iterator object, so the object itself is reusable. Especially useful if you want to pass an iterable object to a function and that function will use the object multiple times. – Cosyn Feb 14 '18 at 01:45
7

Using a wrapper function to handle StopIteration

You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).

And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item
axolotl
  • 1,042
  • 1
  • 12
  • 23
5

If GrzegorzOledzki's answer won't suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.

UPDATE: Also see itertools.tee(). It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you're using the generator.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
5

If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here's a sort snippet that might be handy:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1
Ben Usman
  • 7,969
  • 6
  • 46
  • 66
4

From official documentation of tee:

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

So it's best to use list(iterable) instead in your case.

Shubham Chaudhary
  • 47,722
  • 9
  • 78
  • 80
1

You can define a function that returns your generator

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

Now you can just do as many times as you like:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)
SMeznaric
  • 430
  • 3
  • 13
  • 2
    Thank you for the answer, but main point of question was avoid **creation** , invoking inner function just hides creation - you create it twice – Dewfy Mar 07 '16 at 11:04
1

There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.

Creating iterator object with items 0 to 9

i=iter(range(10))

Iterating through next() function which will pop out

print(next(i))

Converting the iterator object to list

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

So you need to convert the iterator to lists for backup before start iterating. List could be converted to iterator with iter(<list-object>)

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
1

You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.

Install via > pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

Note: memory consumption grows while advancing the iterator, so be wary of large iterables.

pylang
  • 40,867
  • 14
  • 129
  • 121
1

You can do that by using itertools.cycle() you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.

For example:

def generator():
for j in cycle([i for i in range(5)]):
    yield j

gen = generator()
for i in range(20):
    print(next(gen))

will generate 20 numbers, 0 to 4 repeatedly.

A note from the docs:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).
SajanGohil
  • 960
  • 13
  • 26
  • +1 because it works, but I see 2 issues there 1) big memory footprint since documentation states "create a copy" 2) Infinite loop is definitely not what I want – Dewfy Nov 15 '19 at 23:54
0

I'm not sure what you meant by expensive preparation, but I guess you actually have

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

If that's the case, why not reuse data?

ilya n.
  • 18,398
  • 15
  • 71
  • 89
0

Ok, you say you want to call a generator multiple times, but initialization is expensive... What about something like this?

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of 'reset' function.

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html

tvt173
  • 1,746
  • 19
  • 17
  • You just delegate problem to wrapper. Assume that expensive initialization creates generator. My question was about how to reset inside your `__call__` – Dewfy Nov 15 '16 at 19:36
  • Added a second example in response to your comment. This essentially a custom generator with a reset method. – tvt173 Nov 22 '16 at 01:09
-1

My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate. But we need to consume the generator multiple times in multiple functions. In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread. We may not achieve true parallelism due to GIL, but we will achieve our goal.

This approach did a good job in the following case: deep learning model processes a lot of images. The result is a lot of masks for a lot of objects on the image. Each mask consumes memory. We have around 10 methods which make different statistics and metrics, but they take all the images at once. All the images cannot fit in memory. The moethods can easily be rewritten to accept iterator.

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''

def __init__(self, gen):
    self.gen = gen
    self.consumers: List[GeneratorSplitter.InnerGen] = []
    self.thread: threading.Thread = None
    self.value = None
    self.finished = False
    self.exception = None

def GetConsumer(self):
    # Returns a generator object. 
    cons = self.InnerGen(self)
    self.consumers.append(cons)
    return cons

def _Work(self):
    try:
        for d in self.gen:
            for cons in self.consumers:
                cons.consumed.wait()
                cons.consumed.clear()

            self.value = d

            for cons in self.consumers:
                cons.readyToRead.set()

        for cons in self.consumers:
            cons.consumed.wait()

        self.finished = True

        for cons in self.consumers:
            cons.readyToRead.set()
    except Exception as ex:
        self.exception = ex
        for cons in self.consumers:
            cons.readyToRead.set()

def Start(self):
    self.thread = threading.Thread(target=self._Work)
    self.thread.start()

class InnerGen:
    def __init__(self, parent: "GeneratorSplitter"):
        self.parent: "GeneratorSplitter" = parent
        self.readyToRead: threading.Event = threading.Event()
        self.consumed: threading.Event = threading.Event()
        self.consumed.set()

    def __iter__(self):
        return self

    def __next__(self):
        self.readyToRead.wait()
        self.readyToRead.clear()
        if self.parent.finished:
            raise StopIteration()
        if self.parent.exception:
            raise self.parent.exception
        val = self.parent.value
        self.consumed.set()
        return val

Ussage:

genSplitter = GeneratorSplitter(expensiveGenerator)

metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()

metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())
Asen
  • 62
  • 5
  • You just reinvent `itertools.islice` or for async `aiostream.stream.take`, and this post allows you do it in asyn/await way https://stackoverflow.com/a/42379188/149818 – Dewfy Jul 08 '20 at 10:52
  • No. islice "Makes an iterator that returns selected elements from the iterable. Elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. ..." My goal is to consume each element multiple times in different functions which are designed to consume the whole iterator, without generating each element more than once, and without iterating the iterator more than once. – Asen Jul 06 '21 at 18:11
  • 1
    I see a lot of explanation about motivating factors for coming up with this solution, but nothing that actually tries to explain what the solution does or how it works. – Karl Knechtel Jan 03 '23 at 09:49
-2

If you want to reuse this generator multiple times with a predefined set of arguments, you can use functools.partial.

from functools import partial
func_with_yield = partial(FunctionWithYield, arg0, arg1)

for i in range(100):
    for x in func_with_yield():
        print(x)

This will wrap the generator function in another function so each time you call func_with_yield() it creates the same generator function.

Jack-P
  • 75
  • 1
  • 3
  • 5
  • does not work for my case, passinge a generator as a callable to another function. – Kots Dec 09 '22 at 17:59
  • Using `partial` while not actually binding any arguments doesn't do anything useful. Yes, calling `func_with_yield()` would create a new generator object, by calling the `FunctionWithYield` generator function. You know what **else** would do that? **Calling `FunctionWithYield`**. – Karl Knechtel Jan 03 '23 at 09:47
  • @KarlKnechtel Yes you're right, let me update that – Jack-P Jan 24 '23 at 04:05
-3

It can be done by code object. Here is the example.

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4

OlegOS
  • 1
  • 4
    well, actually resetting generator was needed to avoid twice execution of initialization code. Your approach (1) executes initialization twice anyway, (2) it involves `exec` that slightly non-recommended for such simple case. – Dewfy Aug 27 '13 at 11:58