19

Let's say I have a generator like so

def gen():
    a = yield "Hello World"
    a_ = a + 1 #Imagine that on my computer "+ 1" is an expensive operation
    print "a_ = ", a_
    b = yield a_
    print "b =", b
    print "a_ =", a_
    yield b

Now let's say I do

>>> g = gen()
>>> g.next()
>>> g.send(42)
a_ =  43
43

Now we have calculated a_. Now I would like to clone my generator like so.

>>> newG = clonify(g)
>>> newG.send(7)
b = 7
a_ = 43
7

but my original g still works.

>>> g.send(11)
b = 11
a_ = 43
11

Specifically, clonify takes the state of a generator, and copies it. I could just reset my generator to be like the old one, but that would require calculating a_. Note also that I would not want to modify the generator extensively. Ideally, I could just take a generator object from a library and clonify it.

Note: itertools.tee won't work, because it does not handle sends.

Note: I only care about generators created by placing yield statements in a function.

PyRulez
  • 10,513
  • 10
  • 42
  • 87
  • 4
    This isn't as easy as it sounds. What if the generator uses a file or network stream, which can't be cloned? Though take a look at [`itertools.tee`](https://docs.python.org/3/library/itertools.html#itertools.tee) – Colonel Thirty Two Apr 23 '15 at 22:59
  • Is this a duplicate? http://stackoverflow.com/questions/4945155/how-to-clone-a-python-generator-object – tommy.carstensen Apr 23 '15 at 23:01
  • @ColonelThirtyTwo It could throw an error for all I care. It only needs to work for code that doesn't preform input or output. – PyRulez Apr 23 '15 at 23:04
  • 2
    The short answer is that you can't clone generators in Python. But it is at least theoretically possible to add your own generator-cloning support, if you don't care about portability between implementations, meaning you could make all the decisions for yourself (do files share, dup, or raise? are closure variables shared? do you deep-copy or shallow-copy locals?). If you're interested in doing so, there are a lot of good questions you could ask about it, but just "how do I do the whole thing" is probably too broad. – abarnert Apr 23 '15 at 23:07
  • @abarnert Why do you mean "add my own generator-cloning support"? Do you mean modifying python itself? – PyRulez Apr 23 '15 at 23:09
  • @abarnert I only care about generators made with yield. – PyRulez Apr 23 '15 at 23:11
  • @PyRulez: No, you (at least hopefully) don't need to modify Python itself, but you do need to do things like dig into CPython's frame and code objects, which means your code won't run on Jython, or may not even run on a different version of CPython. – abarnert Apr 23 '15 at 23:12
  • @PyRulez: What do you think you're excluding by saying that? The only other way to make generators is with something like `(x+1 for x in thing)`. – user2357112 Apr 23 '15 at 23:14
  • @user2357112 There is another method, where you make a class with the methods `send` and `next`. – PyRulez Apr 23 '15 at 23:15
  • 1
    @user2357112: Actually, even that's implemented by compiling and then calling a hidden function with a `yield x+1` statement in it… – abarnert Apr 23 '15 at 23:16
  • 1
    @PyRulez: No, that's not actually a generator. Although it would be substitutable for a generator almost everywhere, so you could call it a "generator-like object", it's not going to pass `inspect.isgenerator`, or `isinstance(g, types.GeneratorType)`, and it doesn't have the `gi_frame` attribute. – abarnert Apr 23 '15 at 23:18
  • why are you even using send? ... surely there is a better way to get the data you want into the script – Joran Beasley Apr 23 '15 at 23:21
  • 1
    @JoranBeasley: While his toy example isn't a useful example of using generators for coroutines, there are plenty of good examples out there, which would have this problem. (If there weren't, Guido would have rejected the PEP that added `send`, instead of becoming its co-author and making sure it got done in time for 2.5…) – abarnert Apr 23 '15 at 23:23
  • 1
    Anyway, now you've got me interested in whether you actually could clone a generator using just what CPython (3.4, not 2.7) exposes, so I'm going to go experiment with it and write it up; I'll come back and put a link here if I get anything interesting… – abarnert Apr 23 '15 at 23:25
  • lol yeah i was trying to come up with something too ... but so far not much luck there ... – Joran Beasley Apr 23 '15 at 23:27
  • @abarnert If its any incentive, this will allow me to implement CPS and monads and parsers using yield notation. – PyRulez Apr 23 '15 at 23:37
  • @JoranBeasley: It's not that hard in concept: the new generator can share the same code object, because they're immutable, and it just needs a new frame object with the same code, instruction pointer, globals, and builtins, and either the same/shallow-copied/deep-copied locals. The tricky bit is cramming the locals into the frame (at the Python level, locals looks like a dict, but under the covers the frame uses an array of locals and an array of references to closure cell objects). – abarnert Apr 23 '15 at 23:38
  • OK, after a bit of playing with it, it's not doable without more C level hacking than I'd hoped. I may still keep playing with it and see if I can get it to work (possibly punting on supporting closures), but no promises, and it's not going to be as nice as I implied it might be. See my answer for details. – abarnert Apr 24 '15 at 01:13
  • One last comment: This is totally off-topic here, but you might be interested in [PEP 492](https://www.python.org/dev/peps/pep-0492/), which provides a way to build and use coroutines with an F#-style `async`/`await` syntax instead of directly using generators; it might be interesting to think about how you'd clone await-generators, and whether you can implement CPS on top of them… – abarnert Apr 24 '15 at 01:20
  • Two last one-last-comments: You can of course implement monads without generators at all. For example, `list` with `lambda x: [x]` as its `return` function, and `lambda f: (lambda m: [f(x) for x in m])` as its `fmap` function, and `lambda m: [i for m2 in m for i in m]` as its `bind` function is a monad. And I don't know why you'd want to implement CPS in Python; transforming into CPS is useful for proving things about code, but it's rarely a pleasant way to actually write the code. – abarnert Apr 24 '15 at 01:26
  • @abarnert I mean monads in something analogous to do notation. Also, you can use CPS to make monads. – PyRulez Apr 24 '15 at 01:29
  • @PyRulez: I'm not sure that clonable coroutines are either necessary or sufficient to build something like `do` notation, but this obviously isn't the place to discuss that further… Anyway, while `do` notation may require monads, monads do not require `do` notation; any type with appropriate `return`, `fmap`, and `join` is a monad. – abarnert Apr 24 '15 at 01:32
  • @PyRulez: As for using CPS to make monads, sure, but you can also make monads without CPS. In particular, neither OCaml's perform nor Haskell's do is implemented in terms of CPS (although transforming to CPS is probably the easiest way to show that the two are equivalent), and F#'s imperative expression is translated into the equivalent of a do without using CPS. But, more importantly, we already have the equivalent of F#'s imperative expression: it's just a block of statements. So, why do you want to build something more complicated with the same effect? – abarnert Apr 24 '15 at 01:34

3 Answers3

22

Python doesn't have any support for cloning generators.

Conceptually, this should be implementable, at least for CPython. But practically, it turns out to be very hard.


Under the covers, a generator is basically nothing but a wrapper around a stack frame.*

And a frame object is essentially just a code object, an instruction pointer (an index into that code object), the builtins/globals/locals environment, an exception state, and some flags and debugging info.

And both types are exposed to the Python level,** as are all the bits they need. So, it really should be just a matter of:

  • Create a frame object just like g.gi_frame, but with a copy of the locals instead of the original locals. (All the user-level questions come down to whether to shallow-copy, deep-copy, or one of the above plus recursively cloning generators here.)
  • Create a generator object out of the new frame object (and its code and running flag).

And there's no obvious practical reason it shouldn't be possible to construct a frame object out of its bits, just as it is for a code object or most of the other hidden builtin types.


Unfortunately, as it turns out, Python doesn't expose a way to construct a frame object. I thought you could get around that just by using ctypes.pythonapi to call PyFrame_New, but the first argument to that is a PyThreadState—which you definitely can't construct from Python, and shouldn't be able to. So, to make this work, you either have to:

  • Reproduce everything PyFrame_New does by banging on the C structs via ctypes, or
  • Manually build a fake PyThreadState by banging on the C structs (which will still require reading the code to PyFrame_New carefully to know what you have to fake).

I think this may still be doable (and I plan to play with it; if I come up with anything, I'll update the Cloning generators post on my blog), but it's definitely not going to be trivial—or, of course, even remotely portable.


There are also a couple of minor problems.

  • Locals are exposed to Python as a dict (whether you call locals() for your own, or access g.gi_frame.f_locals for a generator you want to clone). Under the covers, locals are actually stored on the C stack.*** You can get around this by using ctypes.pythonapi to call PyFrame_LocalsToFast and PyFrame_FastToLocals. But the dict just contains the values, not cell objects, so doing this shuffle will turn all nonlocal variables into local variables in the clone.****

  • Exception state is exposed to Python as a type/value/traceback 3-tuple, but inside a frame there's also a borrowed (non-refcounted) reference to the owning generator (or NULL if it's not a generator frame). (The source explains why.) So, your frame-constructing function can't refcount the generator or you have a cycle and therefore a leak, but it has to refcount the generator or you have a potentially dangling pointer until the frame is assigned to a generator. The obvious answer seems to be to leave the generator NULL at frame construction, and have the generator-constructing function do the equivalent of self.gi_f.f_generator = self; Py_DECREF(self).


* It also keeps a copy of the frame's code object and running flag, so they can be accessed after the generator exits and disposes of the frame.

** generator and frame are hidden from builtins, but they're available as types.GeneratorType types.FrameType. And they have docstrings, descriptions of their attributes in the inspect module, etc., just like function and code objects.

*** When you compile a function definition, the compiler makes a list of all the locals, stored in co_varnames, and turns each variable reference into a LOAD_FAST/STORE_FAST opcode with the index into co_varnames as its argument. When a function call is executed, the frame object stores the stack pointer in f_valuestack, pushes len(co_varnames)*sizeof(PyObject *) onto the stack, and then LOAD_FAST 0 just accesses *f_valuestack[0]. Closures are more complicated; a bit too much to explain in a comment on an SO answer.

**** I'm assuming you wanted the clone to share the original's closure references. If you were hoping to recursively clone all the frames up the stack to get a new set of closure references to bind, that adds another problem: there's no way to construct new cell objects from Python either.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 2
    You should suggest a PEP. – PyRulez Apr 24 '15 at 01:18
  • 2
    @PyRulez: A PEP needs a good Rationale section, and I don't have a good rationale for why constructing `frame` and `generator` objects (and `cell` objects, if you want that) is necessary. Besides, even with a rationale, I don't think the idea is PEP-ready; it would need some discussion on python-ideas first. – abarnert Apr 24 '15 at 01:27
  • 1
    @PyRulez: You can of course go start that discussion on python-ideas, but be aware that the first 20 responses are going to be asking for a good use case and showing you that toy use case is better served with a solution that doesn't require cloning generators, so I'd first work out the smallest use case you can think of that really is useful and really does require cloning generators if you want to get good uptake. – abarnert Apr 24 '15 at 01:28
  • 2
    Use Case: Something like Haskells parsec in Python. With cloning, you can try different paths of the code. Nondeterminism with list monads as well. – PyRulez Apr 24 '15 at 01:32
  • I've got a use case. I'm trying to determine the shape of an n-dimensional generator expression of generator expressions (e.g. `((m*i + j for j in range(m)) for i in range(n))`) as well as an iterator over the flattened list. I think it's impossible to do without cloning the generator. [Here's a GitHub gist that shows exactly what I mean.](https://gist.github.com/castle-bravo/b4ccd4f7013cce06ed7d22e6e586c13f) – castle-bravo Dec 23 '16 at 20:41
3

You can't, in general. However, if you parametrise over some expensive operation why not lift that operation out, creating a generator factory?

def make_gen(a):
    a_ = [a + 1]  # Perform expensive calculation
    def gen(a_=a_):
        while True:
            print "a_ = ", a_
            a_[0] = yield a_[0]
    return gen

Then you can create as many generators as you like from the returned object:

gen = make_gen(42)
g = gen()
g.send(None)
# a_ = [43]
g.send(7)
# a_ = [7]
new_g = gen()
new_g.send(None)
# a_ = [7]
nelfin
  • 1,019
  • 7
  • 14
  • You can't set `a` using send. – PyRulez Apr 23 '15 at 23:13
  • I sure am, right there in the example. Though to be fair, I'm missing a `g.send(None)` – nelfin Apr 23 '15 at 23:18
  • 2
    @nelfin: I think his point is that if you wanted to do `a_ = yield b`, it wouldn't work, because you've turned `a_` into a closure variable. (In Python 3, there would be a trivial fix: just add `nonlocal a_` to the top of the generator. But in 2.7, it's not as easy.) – abarnert Apr 23 '15 at 23:20
  • 1
    Ah, okay thanks @abamert. In that case, just wrap `a_` in a mutable type like a `dict` and rebind a name for that locally in `gen`. – nelfin Apr 23 '15 at 23:29
  • In the example, `g.send(42)` was ran to assign 42 to `a`, but in this, its a function call. – PyRulez Apr 23 '15 at 23:40
  • When you have a hammer, everything looks like a nail. – nelfin Apr 23 '15 at 23:44
  • @nelfin: Agreed; it's the same as every other problem with factories generating closures over mutable variables in Python 2.x, nothing special to generators, so the same workarounds (or the actual fix of upgrading to 3.x:) work. And I think this is the best solution for most obvious use cases of cloning generators (for the same reason that a closure factory is the best solution for most possible use cases of cloning or modifying function objects…). – abarnert Apr 24 '15 at 01:15
1

Whilst not technically returning a generator, if you don't mind fully expanding your sequence:

source = ( x**2 for x in range(10) )
source1, source2 = zip(*( (s,s) for s in source ))

>>> print( source1, type(source1) )
(0, 1, 4, 9, 16, 25, 36, 49, 64, 81) <class 'tuple'>

>>> print( source2, type(source2) )
(0, 1, 4, 9, 16, 25, 36, 49, 64, 81) <class 'tuple'>

If your function is expensive, then consider using either joblib or pathos.multiprocessing. Joblib has simpler syntax and handles pool management behind the scenes, but only supports batch processing. Pathos forces you to manually manage and close your ProcessPools, but also as the pool.imap() an pool.uimap() functions which return generators

from pathos.multiprocessing import ProcessPool

pool = ProcessPool(ncpus=os.cpu_count())
try:
    def expensive(x): return x**2
    source  = range(10)
    results = pool.imap(expensive, source)
    for result in results:
        print(result)
except KeyboardInterrupt: pass
except: pass
finally:
    pool.terminate()

In theory, you could set this to run in a separate thread and pass in two queue objects that could be read independently and could preserve generator like behavior as suggested in this answer:

James McGuigan
  • 7,542
  • 4
  • 26
  • 29