Synchronized and concurrent data structure patterns with Boost.Asio

Question

I'm searching for some guiding principles to apply when using container data structures with Boost.ASIO. The Boost.ASIO documentation describes how strand objects can be used to provide serialized access to shared resources without explicit thread synchronization. I am looking for a systematic way to apply strand synchronization to:

STL (or STL-like) containers (e.g., std::deque, std::unordered_map); and
wait-free containers such as boost::lockfree::spsc_queue or folly::ProducerConsumerQueue.

My questions are listed below. I should mention that my main questions are 1-3, but with justification I am also prepared to accept "These questions are moot/misguided" as an answer; I elaborate on this in question 4.

To adapt an arbitrary STL container for safe synchronized use, is it sufficient to perform all its operations through a strand instance?
To adapt a wait-free read-write container for synchronized, concurrent use, is it sufficient to wrap its operations through two distinct strands, one for read operations and one for write operations? This question hints at a "yes", although in that use case the author describes using a strand to coordinate producers from several threads, while presumably only reading from one thread.
If the answer to 1-2 above is yes, should the strand just manage operations on the data structure through calls to boost::asio::post?

To give an idea of the issue in 3., here is a snippet from the chat client example:

void write(const chat_message& msg)
{
  boost::asio::post(io_context_,
      [this, msg]()
      {
        bool write_in_progress = !write_msgs_.empty();
        write_msgs_.push_back(msg);
        if (!write_in_progress)
        {
          do_write();
        }
      });
}

Here, write_msgs_ is a queue of chat messages. I ask because this is a bit of a special case where the call to post can potentially invoke a composed async operation (do_write). What if I just wanted to push or pop from a queue? To take a highly simplified example:

template<typename T>
class MyDeque {
public:
     push_back(const T& t);
    /* ... */
private:
    std::deque<T> _deque;
    boost::asio::io_context::strand _strand
};

Then should MyDeque::push_back(const T& t) just call

boost::asio::post(_strand, [&_deque]{ _deque.push_back(t); })

and similarly for other operations? Or is boost::asio::dispatch a more appropriate option?

Finally, I know that there are a plenty of robust implementations of concurrent vectors, hash maps, etc. (e.g., Intel Thread Building Blocks Containers). But it seems that under restricted use cases (e.g., just maintaining a list of active chat participants, storing recent messages, etc.) that the force of a fully concurrent vector or hash map may be a bit overkill. Is this indeed the case or would I be better off just using a fully concurrent data structure?

score 2 · Accepted Answer · answered Feb 16 '18 at 20:58

I am looking for a systematic way to apply strand synchronization to:

STL (or STL-like) containers (e.g., std::deque, std::unordered_map); and

You're looking for something that doesn't exist. The closest thing is Active Objects, and you hardly need strands for that unless you have asynchronous operations on them. That makes almost zero sense, because no operation on STL containers should have a time complexity enough to warrant asynchrony. The computational complexity on the other hand would be such that adding any kind of synchronization would be very suboptimal -

Instead of fine-grained locking [which you automatic opt for when doing STL data structures as ActiveObjects] you will always find better performance with coarse-grained locking.

Even sooner in the design, you will always have more performance by reducing sharing than by "optimizing synchronization" (those are a contradiction).

wait-free containers such as boost::lockfree::spsc_queue or folly::ProducerConsumerQueue.

Why would you even synchronize access to wait-free containers. Wait free implies no synchronization.

The Bullets

To adapt an arbitrary STL container for safe synchronized use, is it sufficient to perform all its operations through a strand instance?

Yes. Only that's not something that exists. Strands wrap async tasks (and their completion handlers, which are just tasks from the POV of the executor).

See the rant above.
To adapt a wait-free read-write container for synchronized, concurrent use, is it sufficient to wrap its operations through two distinct strands, one for read operations and one for write operations?

Like mentioned, it's silly to synchronize access to lock-free constructs.

This question hints at a "yes", although in that use case the author describes using a strand to coordinate producers from several threads, while presumably only reading from one thread.

That's specifically related to the SPSC queue, i.e. where additional constraints are placed on threads performing read/write operations.

While indeed the solution here is to create logical threads of execution with exclusive access to either set of operations, notice that you are constraining the tasks, which is fundamentally different angle from constraining the data.
If the answer to 1-2 above is yes, should the strand just manage operations on the data structure through calls to boost::asio::post?

So, the answer wasn't "yes". The idea of posting all operations through post would come down to implementing the Active Object pattern, as mentioned in my introduction. Yes, you can do that, and no, that's not gonna be smart. (I'm pretty sure that if you do, by definition you can forget about using lock-free containers)
[....]
Then should MyDeque::push_back(const T& t) just call
```
boost::asio::post(_strand, [&_deque]{ _deque.push_back(t); })
```
Yes, that's the ActiveObject pattern. However, consider how you would implement top(). Consider what you'd do if you had two MyDeque instances (a and `b) and wanted to move items from one to another:
```
if (!a.empty()) {
    auto value = a.top();     // synchronizes on the strand to have the return value
    a.pop();                  // operation on the strand of a
    b.push(std::move(value)); // operation on the strand of b
}
```
Since queue b is not on the strand of a, b.push() could actually commit before a.pop(), which may not be what you expect. Also, it's is bleedingly obvious that all the fine-grained synchronization steps are going to be far less efficient than having a strand for all operations that work on a set of data structures.
[...] But it seems that [...] that the force of a fully concurrent vector or hash map may be a bit overkill

There's no "force" in fully concurrent vectors or hash maps. There's a cost to them (in terms of processor business) and a gain (in terms of lower latency). In the cases you mention, latency is rarely the issue (registering a session is an infrequent event, and dominated by actual IO speeds), so you'd be best of using the simplest thing (IMO that would be single-threaded server for those datastructures). Have workers for any non-trivial operations - they could run on a pool of threads. (E.g. if you decide to implement a chess-playing chat-bot)

You want strands to form logical threads of execution. You want to synchronize access to your datastructures, not your datastructures per se. Sometimes lockfree datastructures are a simple choice to avoid having to design things well, but don't expect it magically perform well.

Some links:

boost::asio and Active Object (Tanner Sansbury on ActiveObject with Asio) has a lot of thoughts that overlap with your questions
How to design proper release of a boost::asio socket or wrapper thereof is my example of maintaining a list of connections

Thanks for your answer and the references in the links--the code from your example connection list is highly instructive, and the reference to the Active Object pattern is useful. I see now that my approach of "throw a strand into everything" was highly ill-conceived. (sorry for the apparent disconnect, I accidentally submitted this comment by pressing enter and then quickly deleted it, a follow up remark/minor question appears below) — Lia Stratopoulos, Feb 16 '18 at 22:27
Would you mind commenting if the following broad guidelines are an accurate reflection of your answer: 1. for a class like the asio chat server example if it were to run multithreaded, a strand would be needed to coordinate socket ops and this strand could be used to synchronize access to the msg queue (e.g., `async_read` a message; `ReadHandler` writes it to the queue); and 2. for a more general class like your `basic_connection_pool` example, multithreading requires mutex locking because we can't make assumptions about coexisting connection pools (q.v., your `MyDeque` counterexample) — Lia Stratopoulos, Feb 16 '18 at 22:47
_" a strand would be needed to coordinate socket ops_" - yes, unless there's an implicit strand (see another Tanner classic: [Why do I need a strand per connection](https://stackoverflow.com/questions/12794107/w/12801042#12801042)). Many stream "conversation" follow a purely sequential operation chain, so that would be an implicit chain... — sehe, Feb 16 '18 at 22:54
Only when you start doing full-duplex or asynchronous messaging on the connection you'll need to make that strand explicit. See my answer here: which [starts with synchronous code, moves on to implicit strand and finally adds the explicit strand](https://stackoverflow.com/questions/48507883/d/48523707#48523707). — sehe, Feb 16 '18 at 22:54
Thanks for the additional links! I think I understand explicit strands vis-a-vis a single thread calling `io_context::run` or a half-duplex protocol like HTTP but that timer example progression/discussion is very helpful. I am indeed aiming at full-duplex messaging; I didn't mention this in the original question but basically I am doing some background research to flesh out a WebSocket server/messaging endpoint built on ASIO and Beast — Lia Stratopoulos, Feb 16 '18 at 23:36

Synchronized and concurrent data structure patterns with Boost.Asio

1 Answers1

The Bullets