How to limit Boost.Asio memory

Question

I'm having trouble managing the work .post()'ed to Boost.Asio's io_context, having multiple questions about it (newbie warning).

Background: I'm writing a library that connects to a large number of different hosts for shorts periods at a time each (connect, send data, receive answer, close), and I figured using Boost.Asio. The documentation is scarce (too DRY?)
My current approach is this: (assuming a quad core machine): two physical cores run CPU bound sync operations, and post() additional work items to io_context. Two other threads are .run()ing and performing completion handlers.

1- The work scheduler
As per this amazing answer,

Boost.Asio may start some of the work as soon as it has been told about it, and other times it may wait to do the work at a later point in time.

When does boost.asio do what? On what basis is the queued work later processed?

2- Multiple Producers/ Multiple Consumers

As per This article,

At its core, Boost Asio provides a task execution framework that you can use to perform operations of any kind. You create your tasks as function objects and post them to a task queue maintained by Boost Asio. You enlist one or more threads to pick these tasks (function objects) and invoke them. The threads keep picking up tasks, one after the other till the task queues are empty at which point the threads do not block but exit.

I am failing to find a way to put a cap on the length of this task queue. This answer gives a couple of solutions, but they both involve locking, something I'd like to avoid as much as possible.

3- Are strands really necessary? How do I "disable them"
As detailed in this answer, boost uses an implicit strand per connection. Making potentially millions of connections, the memory savings by "bypassing" strands make sense to me. As the requests I make are independent (different host to each request), operations I make within a single connection is already serialized (callback chain) so I have no overlapping reads & writes, and no synchronization is expected from Boost.Asio. Does it make sense for me to try and bypass strands? If so, how?

4- Scaling design approach (A bit vague because I have no clue) As stated in my background section, I'm running two io_contexts on two physical cores, each with two threads one for writing and one for reading. My goal here is to spew packets as fast as I can, and I have already

Compiled asio with BoringSSL (OpenSSL is a serious bottleneck)
Wrote my own c-ares resolver service to avoid async-ish DNS queries running in a thread loop.

But it still happens that my network driver starts timing out when multiple connections are opened. So how do I dynamically adjust boost.asio's throughput, the network adapter can cope with it?

My question(s) is most likely ill-informed as I'm no expert in network programming, and I know this a complex problem, I'd appreciate it if someone left pointers for me to look before closing the question or making it "dead".

Thank you.

Excellent question, well researched. One immediate correction: "boost uses an implicit strand per connection" is just wrong. If you're single-threading there is an implicit (global) strand. You only get "implied strand" (hence without actual strand) if your handlers chain strictly sequentially (e.g. half-duplex write chain). Moreover in recent versions of Asio you can [supply a concurrency hint](https://www.boost.org/doc/libs/1_73_0/doc/html/boost_asio/overview/core/concurrency_hint.html)) if you want to optimize internal overhead. — sehe, May 05 '20 at 13:53
This then answers my third question, since there is no actual strand. Thank you! — Wahib Mkadmi, May 05 '20 at 16:28

How to limit Boost.Asio memory

0 Answers0