2

I'm trying to figure out how to write a generator function in C using the CPython api. Unfortunately i do not understand how to and the docs do not explain it very well. Could someone explain how generator functions work in low level code and how i could create them?

Something that would work alike to

def gen_func(*args):
    for arg in args:
        yield arg
mental
  • 836
  • 6
  • 18
  • Do you actually need to create a generator, or is any iterator acceptable? Because iterators are pretty easy (and then this question is [probably a dup](https://stackoverflow.com/questions/1815812/how-to-create-a-generator-iterator-with-the-python-c-api)). Meanwhile, if you want a generator, does it just have to be a function that returns a generator iterator, or does it have to actually work like a generator function? – abarnert Mar 18 '18 at 04:10
  • I've been thinking about that, and i realized that iterators are easily done and generators could probably be created through `PyRun_String` but the real trick is creating a generator function, so not an iterator and it has to work like a generator function – mental Mar 18 '18 at 04:17
  • In that case, you may want to take a look at how Cython does it—although it can be painful if you're not used to reading through Cython-generated code (and of course it's overgeneralized if you don't want to deal with `send`/etc., don't need to fake tracebacks, etc.). Create an even simpler generator (`def gen_func(): yield 1`), save it in a .pyx file, `cythonize` it, and read the `.c` file; there's should be a `__pyx_pf_[mod-name-mangle]_gen_func` for the function itself, and functions near that for the `__next__`, `send`, `close`, and `throw` methods, but a lot of the guts will be in Cython. – abarnert Mar 18 '18 at 04:21
  • Anyway, the sense I get from looking at it is that you have to do everything manually—the `__next__` is basically everything you do for an iterator, plus managing a first-run flag, and the other methods all have to be written as well. – abarnert Mar 18 '18 at 04:22
  • Also, even if a function returning a simple iterator is not acceptable, you should read the linked question (which I've only skimmed very briefly so far); the OP started out wanting to create a generator function even though he ended up happy with a function returning an iterator. – abarnert Mar 18 '18 at 04:23
  • Anyway, needless to say, you can't actually `yield` anything, because C doesn't have coroutines (I mean, you could `setjmp`/`longjmp`, but then you have to work around CPython's use of the C stack…) or closures; you're going to have to rewrite the function body as something more like an iterator class even if you end up simulating a generator function on top of it. – abarnert Mar 18 '18 at 04:26
  • Finally, "how generators work in low-level code" is simpler, but useless to you: they work by using CPython frame objects, which does no good for a C function. If you want to change your question to ask for a detailed explanation on things like how generator frames and the `YIELD_VALUE` bytecode and so on work with the ceval loop, I could answer that, but I don't think it's what you're interested in. – abarnert Mar 18 '18 at 04:28
  • Oh, and yeah, of course `PyRun_String` works—or anything that can simulate eval or exec—but that seems like cheating; you're not actually building a generator function in C, but in Python. – abarnert Mar 18 '18 at 04:30
  • Could you post that as an answer so I could accept and upvote? Lol – mental Mar 18 '18 at 12:33

1 Answers1

1

First, PyRun_String (or anything that can simulate eval or exec) can of course do it, but that seems like cheating; you're not building a generator function in C, you're building one in Python and then calling that in C.

Anyway, the reason you can't figure out how to build a generator function with the C API is that there's no C API to do it. Or, rather, there is a C API to build generators out of CPython frame objects running Python generator code objects (and to build generator functions out of generator code objects, but that part you can even do from Python; it's just the types.FunctionType constructor), but that won't do you any good. (Unless you just want to write C code that builds Python bytecode for a generator, which would be cheating just as much as PyRun_String, and more work.)

So, if you want to build a generator function in C, you have to do it manually. It is clearly possible to do this, as proved by the fact that Cython can do it (up to some limit--e.g., inspect.isgeneratorfunction and inspect.isgenerator will return False on gen_func and gen_func()). But it's not easy, and I'm not sure what it gets you.

The core problem is that CPython implements generators by freezing CPython frames and passing them around (hence the API). C code doesn't use CPython frames, it uses the C stack. (Even if you used setjmp/longjmp and explicit stack copying to build C coroutines, you'd be fighting with the way CPython itself uses the C stack.)

So, the only viable option I can think of it to build an iterator class (which this answer shows how to do) and then implement the rest of the generator protocol on top of that. It'll be basically the same as implementing the generator protocol in Python, but storing your state on your PyObject struct instead of in your object dict, just like translating any other class to C.

If you want to see what Cython does, it's essentially that, although you have to wade through a lot of boilerplate to see it. Create a file genpyx.pyx:

def gen_func():
    yield None

Then cythonize genpyx.pyx, and look at the created genpyx.c file. (Look for __pyx_gb_6genpyx_2generator, and most of the other stuff right near it.) Despite Cython having a mechanism to partially fake up frames so it can do tracebacks through Cython code with Python on either end, it's still storing all the state explicitly in a struct it passes around through the functions, just as you'd have to. Cython does support two of the quasi-documented generator attributes gi_running and gi_yieldfrom, which is a nice idea, but it can't fake gi_frame and gi_code (any more than extension functions try to fake __code__), and it doesn't fake being an instance of types.GeneratorType (which you could sort of do, but it would be as dangerous as with any other non-heap type).

And meanwhile, if your simulated generator doesn't have any use the value of yield, what's the point in implementing a send that takes and ignores an argument, checks that an otherwise-unnecessary first-run flag is set, and then does the same thing as __next__? Trying to implement as much of the generator protocol as possible is necessary if you're building Cython and need something to compile Cython generator bodies to, but YAGNI if you're just translating things like gen_func manually.

abarnert
  • 354,177
  • 51
  • 601
  • 671