What approach to take to monitor threads sharing io_context

Question

I have a thread pool where all the threads are sharing one boost::asio::io_context. The threads in thread pool are always running because main thread has io_context::run and i am using boost::asio::executor_work_guard so it never goes out of work.

There are work which gets posted to the io_context and depending on availibility of threads those work gets done. Now i want to monitor the threads (i.e. i want to check after every x interval that the threads are not stuck).

My approach to this is:

In main thread i keep a map of threadId and state(bool)
i make same number of post events to io_context as there are number of threads. In these post events i put sleep for very small time and then get the threadid and set it in the map. I make these post events for every t interval where t < x/n
At x interval i check the map if all the threadIds are set and then unset the fields.

I keep on repeating this after x interval.

My Questions

Is my approach of monitoring the threads sharing ioContext in this way any good ?
If its not good what other approaches can i take ?

sehe · Accepted Answer · 2023-06-07T19:57:36.017

1. Your approach will not reliably work.

The reason is: you are posting your "discovery handlers" to the service, not threads. The service does not guarantee distribution across the available threads. This is a common thing about thread scheduling (even at the OS level), but is also explicitly called out when documenting strands:

Remarks

The implementation makes no guarantee that handlers posted or dispatched through different strand objects will be invoked concurrently.

2. What you can do

You can consider replacing io_service::run with your own loop:

size_t my_run(asio::io_context& io) {
    size_t n = 0;
    while (io.run_one()) {
        n += 1;
        // mark current thread as active and progressing here
    }
    return n;
}

That gives you the opportunity to do some book-keeping of handler invocations as indicated by the comment.

Notes

the above loop is likely not "production quality" and may lead to subtle changes in scheduling - specifically where e.g. locally submitted handlers might be optimized for current-strand execution. This should be rarely noticeable, and certainly does not break any documented contracts (if so, then run_one would never be public).
if you want to reliably detect thread progress when there is no service load (because creating fake load is not reliable as well) you might get "smarter" with a poll_one() instead of run_one() but that would mean you're effectively rewriting scheduling (with spins and backoff sleeps to high CPU when idle). So I'd probably not bother
perhaps it would be a better idea not to focus on the service/threads altogether, but rather focus on executors. You can create your own executor adaptor that does additional bookkeeping. My intuition tells me that it's probably easier to get production quality implementation that way.

Some executor related examples exist in the documentation: https://www.boost.org/doc/libs/1_82_0/doc/html/boost_asio/examples/cpp14_examples.html#boost_asio.examples.cpp14_examples.executors

Oh, PS don't forget about exception handling, which already applied when just using `io_context::run()` (https://stackoverflow.com/q/44500818/85371) — sehe, Jun 07 '23 at 16:14

What approach to take to monitor threads sharing io_context

1 Answers1

1. Your approach will not reliably work.

Remarks

2. What you can do

Notes