co_spawn cost and alternatives

Question

If I have a regular method that calls a coroutine in a critical execution path, using co_spawn can potentially introduce latency.

When I use co_spawn, it schedules the coroutine to run concurrently with the rest of the code, which means it doesn't block the execution of the calling method. However, there is still some overhead involved in scheduling and managing the coroutine, which can impact the overall latency of the application.

Is there a more efficient way to call a coroutine from a regular function?

sehe · Accepted Answer · 2023-06-29T23:36:54.357

When I use co_spawn, it schedules the coroutine to run concurrently with the rest of the code, which means it doesn't block the execution of the calling method. However, there is still some overhead involved in scheduling and managing the coroutine

This is not necessarily true. You're conflating concurrency with asynchrony.

In the context of ASIO asynchronous operations need not be scheduled. Instead they may delegate work to the kernel or, indeed, hardware which is naturally asynchronous. The only element of "scheduling" there is the invocation of the completion. Indeed, if your IO operations take infinitely small time then the callback invocation will dominate the wall-clock timing observed.

However, usually, IO operations are decidedly costly, relative to e.g. CPU oriented load. You could start and many tasks in the span of a TCP roundtrip (even on the loop-back network). This is why asynchronous IO frameworks are popular. It's also why Windows implement IO completion ports etc. All the many innovations over time are there, because they make sense.

Is there a more efficient way to call a coroutine from a regular function?

Yes. In principle writing your own state machines directly on OS primitives available is theoretically fastest. However it will be tedious, platform-dependent and VeryHard(TM) to integrate with other code. This is the hole that Asio neatly plugs.

To reduce overhead to the minimum:

opt out of threading (https://stackoverflow.com/a/72292313/85371)
opt out of type erased executors (use e.g. co_await_t<io_context::executor_type> instead of parameterized by the default any_io_executor)

The library does clever optimizations relating to (but not limited to) dispatching completions queued on the local thread, managing allocation order to maximize reuse and minimize fragmentation. Keep an eye on the immediate completion optimization feature that landed in the latest releases.

E.g. when you do

Live On Compiler Explorer

#include <boost/asio.hpp>
namespace asio = boost::asio;

using Ex = asio::io_context::executor_type;

asio::awaitable<int, Ex> static inline answer(std::string_view prompt) {  //
    co_return prompt.length() + 9;
}

int main() {
    asio::io_context ioc(1);
    co_spawn(ioc.get_executor(), answer("Life, the Universe and Everything"),
             [](std::exception_ptr, int i) { ::exit(i); });
    ioc.run();
}

You see the program returning 42 without any scheduling overhead. The most tangible overhead I see is from allocating the coro frame, which you would usually need if you needed a coro anyways (you could just be submitting handlers to a queue - or the io context - otherwise).

Is there a more efficient way to call a coroutine from a regular function?

Coroutines can be very lightweight. Just how lightweight they become depends on your compiler's ability to optimize them. That, in turn, will depend mostly on the complexity of the awaitable's types (promise/handle). In principle it will be very possible to reduce the cost, at the cost of cutting functionality. Asio's awaitable is designed for asynchronous IO scenarios, for obvious reasons. If you don't want/need that, instead look at more low-level or general-purpose libraries, like perhaps cppcoro. Of course, you will be on your own integrating them into your application's asynchronous IO needs if you have them.

co_spawn cost and alternatives

1 Answers1