15

I have this Cython code (simplified):

class Callback:
    async def foo(self):
        print('called')

cdef void call_foo(void* callback):
    print('call_foo')
    asyncio.wait_for(<object>callback.foo())

async def py_call_foo():
    call_foo(Callback())

async def example():
    loop.run_until_complete(py_call_foo())

What happens though: I get RuntimeWarning: coroutine Callback.foo was never awaited. And, in fact, it is never called. However, call_foo is called.

Any idea what's going on / how to get it to actually wait for Callback.foo to complete?


Extended version

In the example above some important details are missing: In particular, it is really difficult to get hold of return value from call_foo. The real project setup has this:

  1. Bison parser that has rules. Rules are given a reference to specially crafted struct, let's call it ParserState. This struct contains references to callbacks, which are called by parser when rules match.

  2. In Cython code, there's a class, let's call it Parser, that users of the package are supposed to extend to make their custom parsers. This class has methods which then need to be called from callbacks of ParserState.

  3. Parsing is supposed to happen like this:

    async def parse_file(file, parser):
        cdef ParserState state = allocate_parser_state(
            rule_callbacks,
            parser,
            file,
        )
        parse_with_bison(state)
    

The callbacks are of a general shape:

ctypedef void(callback*)(char* text, void* parser)

I have to admit I don't know how exactly asyncio implements await, and so I don't know if it is in general possible to do this with the setup that I have. My ultimate goal though is that multiple Python functions be able to iteratively parse different files, all at the same time more or less.

wvxvw
  • 8,089
  • 10
  • 32
  • 61
  • I think you should `await call_foo(...)` in `py_call_foo`. Also, you should `return asyncio.wait_for(...)` in `call_foo`. Otherwise, the event loop exits before `Callback.foo` is done running, and asyncio complains. – user4815162342 Feb 26 '18 at 13:09
  • @user4815162342 the problem is, in reality, `call_foo` is called in C code, and I don't have a way to get return value from that call (more concretely, it is called by code generated by `Bison`). – wvxvw Feb 26 '18 at 13:33
  • Can you *store* the return value somewhere? E.g. `callback.future = asyncio.wait_for(callback.foo())`? Then you could `await callback.future` at the end of `py_call_foo`. – user4815162342 Feb 26 '18 at 14:02
  • @user4815162342 I'll have to post an extended version to make it clear what's going on. – wvxvw Feb 26 '18 at 14:52
  • The question from my last comment still applies: if you can _store_ the intended return value, then you can `await` it as soon as you return from the `cdef`. – user4815162342 Feb 26 '18 at 16:14
  • @user4815162342 no, I cannot store this value because the only Python code that can call `await` will see this value no sooner than the parser finishes parsing the file. At which point it is already meaningless to wait for anything. I'm reading the C asyncio code in an effort to understand how does it suspend execution. My fear is that it doesn't really suspend a thread, but does something on the level of interpreter, in which case my project of suspending C function not specifically written to work with asyncio will not be possible. – wvxvw Feb 26 '18 at 16:38
  • 1
    The suspension is _not_ based on some interpreter trick, it is fully possible to compile it it into efficient machine code (which is what Cython and PyPy do with async functions). Precisely because it doesn't use tricks, it cannot magically transform synchronous code into async. It is possible to do so if the code is non-blocking and based on callbacks - think JavaScript-style "promises" and such. Does the parser you are using support a "push" interface? – user4815162342 Feb 26 '18 at 18:00
  • @user4815162342 I could generate push interface, though I think it's considered experimental. It certainly adds a lot of overhead to the parser and the way I'll have to allocate memory for tokens etc... also speed will roll into the gutter. – wvxvw Feb 26 '18 at 18:10
  • 1
    It is not obvious that parsing using a push interface should be significantly slower; for example, the expat XML parser, known for its efficiency, supports a push interface. Either way, if your parser expects to be "in control" of the parsing process (i.e. not return to the caller until the parsing is done), you will need to use threads or their emulation such as greenlets to run multiple instances concurrently. asyncio is designed to make it easier to program with callbacks by providing suspension primitives that make it appear like you're programming imperatively. – user4815162342 Feb 26 '18 at 18:22
  • @user4815162342 in my case, I was able to write the parser in such a way, that it can call user code without needing to allocate token strings on the heap. This is possible because `Bison` basically generates one huge function that does parsing, and I could allocate a memory pool for tokens on that function's stack. As soon as I use push parser, I will have to pop in and out of Bison's code, and so it won't be able to just have the whole token pool on the stack. That's why it will be less efficient. – wvxvw Feb 27 '18 at 06:45
  • 2
    That's impressive work, although it does come with a cost that it made the parser fundamentally incompatible with asyncio (and similar forms of cooperative multitasking). I wonder, though, if the cost of _one_ heap allocation (for all tokens) really makes a difference compared to the actual parsing work done by the function? – user4815162342 Feb 27 '18 at 07:36
  • Your code cannot produce the reported error. Your code also is not asynchronous, but seems to be based on synchronous callbacks/communication. Unless you have separate actions taking place *concurrently*, ``async`` coroutines are the wrong tool. – MisterMiyagi Jul 03 '18 at 12:15
  • @MisterMiyagi `async` is never the right tool. Figured this out long time ago and moved on. It's just some sort of bizarre and useless circus. – wvxvw Jul 03 '18 at 17:26

1 Answers1

18

TLDR:

Coroutines must be await'ed or run by an event loop. A cdef function cannot await, but it can construct and return a coroutine.

Your actual problem is mixing synchronous with asynchronous code. Case in point:

async def example():
    loop.run_until_complete(py_call_foo())

This is similar to putting a subroutine in a Thread, but never starting it. Even when started, this is a deadlock: the synchronous part would prevent the asynchronous part from running.


Asynchronous code must be awaited

An async def coroutine is similar to a def ...: yield generator: calling it only instantiates it. You must interact with it to actually run it:

def foo():
     print('running!')
     yield 1

bar = foo()  # no output!
print(next(bar))  # prints `running!` followed by `1`

Similarly, when you have an async def coroutine, you must either await it or schedule it in an event loop. Since asyncio.wait_for produces a coroutine, and you never await or schedule it, it is not run. This is the cause of the RuntimeWarning.

Note that the purpose of putting a coroutine into asyncio.wait_for is purely to add a timeout. It produces an asynchronous wrapper which must be await'ed.

async def call_foo(callback):
    print('call_foo')
    await asyncio.wait_for(callback.foo(), timeout=2)

asyncio.get_event_loop().run_until_complete(call_foo(Callback()))

Asynchronous functions need asynchronous instructions

The key for asynchronous programming is that it is cooperative: Only one coroutine executes until it yields control. Afterwards, another coroutine executes until it yields control. This means that any coroutine blocking without yielding control blocks all other coroutines as well.

In general, if something performs work without an await context, it is blocking. Notably, loop.run_until_complete is blocking. You have to call it from a synchronous function:

loop = asyncio.get_event_loop()

# async def function uses await
async def py_call_foo():
    await call_foo(Callback())

# non-await function is not async
def example():
    loop.run_until_complete(py_call_foo())

example()

Return values from coroutines

A coroutine can return results like a regular function.

async def make_result():
    await asyncio.sleep(0)
    return 1

If you await it from another coroutine, you directly get the return value:

async def print_result():
    result = await make_result()
    print(result)  # prints 1

asyncio.get_event_loop().run_until_complete(print_result())

To get the value from a coroutine inside a regular subroutine, use run_until_complete to run the coroutine:

def print_result():
    result = asyncio.get_event_loop().run_until_complete(make_result())
    print(result)

print_result()

A cdef/cpdef function cannot be a coroutine

Cython supports coroutines via yield from and await only for Python functions. Even for a classical coroutine, a cdef is not possible:

Error compiling Cython file:
------------------------------------------------------------
cdef call_foo(callback):
    print('call_foo')
    yield from asyncio.wait_for(callback.foo(), timeout=2)
   ^
------------------------------------------------------------

testbed.pyx:10:4: 'yield from' not supported here

You are perfectly fine calling a synchronous cdef function from a coroutine. You are perfectly fine scheduling a coroutine from a cdef function. But you cannot await from inside a cdef function, nor await a cdef function. If you need to do that, as in your example, use a regular def function.

You can however construct and return a coroutine in a cdef function. This allows you to await the result in an outer coroutine:

# inner coroutine
async def pingpong(what):
    print('pingpong', what)
    await asyncio.sleep(0)
    return what

# cdef layer to instantiate and return coroutine
cdef make_pingpong():
    print('make_pingpong')
    return pingpong('nananana')

# outer coroutine
async def play():
    for i in range(3):
        result = await make_pingpong()
        print(i, '=>', result)

asyncio.get_event_loop().run_until_complete(play())

Note that despite the await, make_pingpong is not a coroutine. It is merely a factory for coroutines.

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119