C++ Coroutine - When, How to use?

Question

As a novice C++ programmer who is very new to the concept of coroutines, I am trying to study and utilize the feature. Although there is explanation of coroutine here: What is a coroutine?

I am not yet sure when and how to use the coroutine. There was several example use cases provided, but those use cases had alternative solutions which could be implemented by pre- C++20 features: (ex:lazy computation of infinite sequence can be done by a class with private internal state variable).

Therefore I am seeking for any usecases that coroutines are particularly useful.

(From the image posted by Izana)

It is important to understand that "coroutine" as a general purpose programming construct, and "coroutine" as a C++ language tool are *not* especially similar. Yes, `co_await` does something similar to the "coroutine" concept, but only if you squint really hard at it. — Nicol Bolas, Feb 17 '22 at 06:24
"*it seems to me that those cases can be achieved by more simpler way*" Define "simpler". A function that does an infinite loop, returning ever increasing numbers via `co_yield` is quite simple conceptually. There's no need for some explicit state object or other boilerplate (beyond the generic generator boilerplate). — Nicol Bolas, Feb 17 '22 at 06:26
Coroutines are just state machines under the hood. The whole point of them is to use them together with `co_await`, `co_yield` and/or `co_return` keywords. This reduces boilerplate, especially `co_await` countermeasures so called "callback hell". So it is a syntactic sugar, and in fact the point is the opposite to what you claim: coroutines reduce the boilerplate and make the code more readable, easier to maintain. Can it be done differently? Of course. Can it be done easier? Unlikely. Have a look at examples here: https://en.cppreference.com/w/cpp/language/coroutines#co_yield — freakish, Feb 17 '22 at 08:14
@freakish Oh yes, I have seen that example and I felt it rather cumbersome, that was one motivation I asked this question. — K.R.Park, Feb 17 '22 at 08:26
Small note: if you are using other people's images, it's polite to cite them. — JHBonarius, Feb 17 '22 at 11:49
@JHBonarius Oh, I didn’t know that. Thanks for the remark, I will edit that ASAP. — K.R.Park, Feb 17 '22 at 11:51
@K.R.Park: What exactly does a "canonical answer" look like for this question? What is missing from the existing answers? Especially since your question is based on what you consider to be "simpler". — Nicol Bolas, Feb 19 '22 at 14:57
@NicolBolas Your answer looks good, and before the bounty expires, and if yours are still best answer then, I will give it to you. — K.R.Park, Feb 20 '22 at 07:41

Nicol Bolas · Answer 1 · 2022-09-22T16:07:45.677

The word "coroutine" in this context is somewhat overloaded.

The general programming concept called a "coroutine" is what is described in the question you're referring to. C++20 added a language feature called "coroutines". While C++20's coroutines are somewhat similar to the programming concept, they're not all that similar.

At the ground level, both concepts are built on the ability of a function (or call stack of functions) to halt its execution and transfer control of execution to someone else. This is done with the expectation that control will eventually be given back to the function which has surrendered execution for the time being.

Where C++ coroutines diverge from the general concept is in their limitations and designed application.

co_await <expr> as a language construct does the following (in very broad strokes). It asks the expression <expr> if it has a result value to provide at the present time. If it does have a result, then the expression extracts the value and execution in the current function continues as normal.

If the expression cannot be resolved at the present time (perhaps because <expr> is waiting on an external resource or asynchronous process or something), then the current function suspends its execution and returns control to the function that called it. The coroutine also attaches itself to the <expr> object such that, once <expr> has the value, it should resume the coroutine's execution with said value. This resumption may or may not happen on the current thread.

So we see the pattern of C++20 coroutines. Control on the current thread returns to the caller, but resumption of the coroutine is determined by the nature of the value being co_awaited on. The caller gets an object that represents the future value the coroutine will produce but has not yet. The caller can wait on it to be ready or go do something else. It may also be able to itself co_await on the future value, creating a chain of coroutines to be resumed once a value is computed.

We also see the primary limitation: suspension applies only to the immediate function. You cannot suspend an entire stack of function calls unless each one of them individually does their own co_awaits.

C++ coroutines are a complex dance between 3 parties: the expression being awaited on, the code doing the awaiting, and the caller of the coroutine. Using co_yield essentially removes one of these three parties. Namely, the yielded expression is not expected to be involved. It's just a value which is going to be dumped to the caller. So yielding coroutines only involve the coroutine function and the caller. Yielding C++ coroutines are a bit closer to the conceptual idea of "coroutines".

Using a yielding coroutine to serve a number of values to the caller is generally called a "generator". How "simple" this makes your code depends on your generator framework (ie: the coroutine return type and its associated coroutine machinery). But good generator frameworks can expose range interfaces to the generation, allowing you to apply C++20 ranges to them and do all sorts of interesting compositions.

Maybe I should think about this long enough… but thanks for the answer anyway. — K.R.Park, Feb 17 '22 at 19:05

tungfai fong · Answer 2 · 2022-02-17T08:58:21.763

1

coroutine makes asynchronous programing more readable.

if there is no coroutine, we will use callback in asynchronous programing.

void callback(int data1, int data2)
{
    // do something with data1, data2 after async op
    // ...
}

void async_op(std::function<void()> callback)
{
    // do some async operation
}

int main()
{
    // do something
    int data1;
    int data2;
    async_op(std::bind(callback, data1, data2));
    return 0;
}

if there is a lot of callback, the code will very hard to read. if we use coroutine the code will be

#include <coroutine>
#include <functional>

struct promise;

struct coroutine : std::coroutine_handle<promise>
{
    using promise_type = struct promise;
};

struct promise
{
    coroutine get_return_object() { return {coroutine::from_promise(*this)}; }
    std::suspend_always initial_suspend() noexcept { return {}; }
    std::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
};

struct awaitable
{
    bool await_ready() { return false; }
    void await_suspend(std::coroutine_handle<promise> h)
    {
        func();
    }
    void await_resume() { }

    std::function<void()> func;
};

void async_op()
{
    // do some async operation
}

coroutine callasync()
{
    // do somethine
    int data1;
    int data2;
    co_await awaitable(async_op);
    // do something with data1, data2 after async op
    // ...
}

int main()
{
    callasync();
    return 0;
}

edited Feb 17 '22 at 08:58

answered Feb 17 '22 at 08:17

tungfai fong

74
2

Hmm, As long as I know, to make it awaitable, we should implement certain interfaces. Like promise_type with `get_return_object()`, `initial_suspend()`, and several others. Is the utility you mentioned exceeds the cost of implementing them? – K.R.Park Feb 17 '22 at 08:23
yes, we should implement promise_type and some awaitable interface when we using coroutine, I didn't make all the implementation in the answer.I was trying to show the difference in async operation when using coroutine. – tungfai fong Feb 17 '22 at 08:32
1

is `task<> main()` legal? – M.M Feb 17 '22 at 08:43
No, task<> would be a implementation of coroutine_handle, you have to implement it, just like task in cppcoro. https://github.com/lewissbaker/cppcoro/blob/master/include/cppcoro/task.hpp – tungfai fong Feb 17 '22 at 08:47
2

I mean, `main()` cannot be a coroutine – M.M Feb 17 '22 at 08:47
you're right. I change my answer and add more detail. – tungfai fong Feb 17 '22 at 08:58
@tungfaifong It seems more cumbersome than original callback…. – K.R.Park Feb 17 '22 at 11:43
@K.R.Park: Is it? You need to package `data1` and `data2` into the callback. And `bind` does it by copy. What if those data are large allocations that are big to copy? Or what if they're non-copyable? The coroutine version handles it all, and does so *transparently*, with no user intervention (outside of the coroutine machinery, which is 100% reuseable). – Nicol Bolas Feb 17 '22 at 15:14

score 1 · Answer 3 · answered Feb 20 '22 at 09:31

Just as lambda in C++ avoid you to define classes and function when you want to capture the context, coroutines also avoid you to define a class and a relatively complex function or set of functions when you want to be able to suspend and resume the execution of a function.

But contrarily to lambda, to use and define coroutines, you need a support library, and C++20 is missing that aspect in the standard library. That has for consequence that most if not all explanations of C++ coroutines target a low level interface and explain as much if not more how to build the support library as how to use it, giving the impression that the usage will be more complex than it is. You get a "how to implement std::vector" kind of description when you want a "how to use std::vector".

To take the example of cppreference.com, coroutines allows you to write

Generator<uint64_t>
fibonacci_sequence(unsigned n)
{
 
  if (n==0)
    co_return;
 
  if (n>94)
    throw std::runtime_error("Too big Fibonacci sequence. Elements would overflow.");
 
  co_yield 0;
 
  if (n==1)
    co_return;
 
  co_yield 1;
 
  if (n==2)
    co_return;
 
  uint64_t a=0;
  uint64_t b=1;
 
  for (unsigned i = 2; i < n;i++)
  {
    uint64_t s=a+b;
    co_yield s;
    a=b;
    b=s;
  }
}

instead (I didn't pass that to a compiler, there must be errors in it) of

class FibonacciSequence {
public:
    FibonacciSequence(unsigned n);
    bool done() const;
    void next();
    uint64_t value() const;
private:
    unsigned n;
    unsigned state;    
    unsigned i;
    uint64_t mValue;
    uint64_t a;
    uint64_t b;
    uint64_t s;
};

FibonacciSequence::FibonacciSequence(unsigned pN)
    : n(pN), state(1)
{}

bool FibonacciSequence::done() const
{
    return state == 0;
}

uint64_t FibonacciSequence::value() const
{
    return mValue;
}

void FibonacciSequence::next() const
{
    for (;;) {
        switch (state) {
        case 0:
            return;
        case 1:
            if (n==0) {
                state = 0;
                return;
            }
            
            if (n>94)
                throw std::runtime_error("Too big Fibonacci sequence. Elements would overflow.");
            
            mValue = 0;
            state = 2;
            return;
        case 2:
            if (n==1) {
                state = 0;
                return;
            }
            mValue = 1;
            state = 3;
            return;
        case 3: 
            if (n==2) {
                state = 0;
                return;
            }
            
            a=0;
            b=1;
            i=2;
            state = 4;
            break;
        case 4:
            if (i < n) {
                s=a+b;
                value = s;
                state = 5;
                return;
            } else {
                state = 6;
            }
            break;
        case 5:
            a=b;
            b=s;
            state = 4;
            break;
        case 6:
            state = 0;
            return;
        }
    }
}

FibonacciSequence fibonacci_sequence(unsigned n) {
   return FibonacciSequence(n);
}

Obviously something simpler could be used, but I wanted to show how the mapping could be done automatically, without any kind of optimization. And I've side stepped the additional complexity of allocation and deallocation.

That transformation is useful for generators like here. It is more generally useful when you want a kind of collaborative concurrency, with or without parallelism. Sadly, for such things, you need even more library support (including a scheduler to chose the coroutine which will be executed next in a given context) and I've not see relatively simple examples of that showing the underlying concepts while avoiding to be drown in implementation details.

Tony Delroy · Answer 4 · 2022-02-24T15:22:38.710

it seems to me that those cases can be achieved by more simpler way: (ex:lazy computation of infinite sequence can be done by a class with private internal state variable).

Say you're writing a function that should interact with a remote server, creating a TCP connection, logging in with some multi-stage challenge/response protocol, making queries and getting replies (often in dribs and drabs over TCP), eventually disconnecting.... If you were writing a dedicated function to synchronously do that - as you might if you had a dedicated thread for this - then your code could very naturally reflect the stages of connection, request and response processing and disconnecting, just by the order of statements in your function and the use of flow control (for, while, switch, if). The data needed at various points would be localised in a scope reflecting its use, so it's easier for the programmer to know what's relevant at each point. This is easy to write, maintain and understand.

If, however, you wanted the interactions with the remote host to be non-blocking and to do other work in the thread while they were happening, you could make it event driven, using a class with private internal state variable[s] to track the state of your connection, as you suggest. But, your class would need not only the same variables the synchronous-function version would need (e.g. a buffer for assembling incoming messages), but also variables to track where in the overall connection/processing steps you left off (e.g. enum state { tcp_connection_pending, awaiting_challenge, awaiting_login_confirmation, awaiting_reply_to_message_x, awaiting_reply_to_message_y }, counters, an output buffer), and you'd need more complex code to jump back in to the right processing step. You no longer have localisation of data with its use in specific statement blocks - and instead have a flat hodge-podge of class data members and additional mental overhead in understanding which parts of the code care about them, when they're valid or not etc.. It's all spaghetti. (The State/Strategy design pattern can help structure this better, but sometimes with runtime for virtual dispatch, dynamic allocation etc..)

Co-routines provide a best-of-both-worlds solution: you can think of them as providing an additional stack for the call to what looks very much like the concise and easy/fast-to-write/maintain/understand synchronous-processing function initially explained above, but with the ability to suspend and resume instead of blocking, so the same thread can progress the connection handling as well as do other work (it could even invoke the coroutine thousands of times to handle thousands of remote connections, switching efficiently between them to keep work happening as network I/O happens).

Harkening back to your "lazy computation of infinite sequence" - in one sense, a coroutine may be overkill for this, as there may not be multiple processing stages/states, or subsets of data members that are relevant therein. There are some benefits to consistency though - if providing e.g. pipelines of coroutines.

C++ Coroutine - When, How to use?

4 Answers4