2

I am using Boost/ASIO to write a C++ server and accompanying client app which talk over TCP/IP. I was seeing delays between consecutive receives which were causing lower-than-expected throughput between the server and the client. The code on either side looked roughly like

class MyStream
{
  ...

  void doStuff()
  {
    asio::async_read(socket, buffer, &bind(MyStream::readCallback, this, _1, _2)); // get some bytes
  }

  void readCallback(const error_code& err, size_t bytes_transferred)
  {
    processData(bytes_transferred);  // maybe write some data back

    asio::async_read(socket, buffer, &bind(MyStream::readCallback, this, _1, _2)); // get some additional bytes
  }

  ...
};

The messaging between the server and client was very slow, something like 20-30 messages back-and-forth per second. I was testing on the local machine, using very small messages.

Using the ASIO compiler flag -DASIO_ENABLE_HANDLER_TRACKING I observed delays between some of the receives which were consistently ~40ms.

...
@asio|1613871309.603585|>22|ec=asio.system:0,bytes_transferred=30
@asio|1613871309.603688|22^34|in 'ssl::stream<>::async_read_some' ([redacted]/asio/include/asio/ssl/detail/io.hpp:167)
@asio|1613871309.603688|22*34|socket@0x7fdebc013f80.async_receive
@asio|1613871309.603703|.34|non_blocking_recv,ec=asio.system:11,bytes_transferred=0
@asio|1613871309.603725|<22|
@asio|1613871309.643974|.34|non_blocking_recv,ec=asio.system:0,bytes_transferred=248
@asio|1613871309.644013|>34|ec=asio.system:0,bytes_transferred=248
...

What could be causing this 40ms delay?

kexu
  • 304
  • 2
  • 8

1 Answers1

3

The issue was that the messages are very small (a few bytes or less) and being written in multiple parts. This pattern, on a TCP connection with both Nagle's algorithm and delayed ACKs configured, results in delays between the separate sends, where the system is waiting for an ACK before sending more data, but there is an ACK delay (40ms by default on my system).

Edit 1:

As David Schwartz mentions in the comments the solution is to aggregate the outbound data.

Separately, one can do a sort-of-workaround by setting the TCP_NODELAY option, which disables Nagle's algorithm.

socket.set_option(asio::ip::tcp::no_delay(true));

There is also a TCP_QUICKACK flag, but from the docs:

This flag is not permanent, it only enables a switch to or from quickack mode. Subsequent operation of the TCP protocol will once again enter/leave quickack mode depending on internal protocol processing and factors such as delayed ack timeouts occurring and data transfer. This option should not be used in code intended to be portable.

Edit 2: For posterity-- I modified the code, once using the no_delay option, and once by changing the logic to do scatter-gather writes + quick acks after message-wise receives (the protocol I'm using does not have application-level acks). With only no_delay, throughput went from ~30 msg/s to ~300 msg/s. With only scatter-gather writes + quick acks, the throughput was about ~1000 msg/s. This is for my particular use case but I hope this gives some sense of what worked best for me.

Edit 3: After combining TCP_NODELAY with scatter-gather writes, I achieved the best throughput, something like 15% faster than scatter-gather with quick acks (which turns out to be a pretty slow system call)

kexu
  • 304
  • 2
  • 8
  • The old Nagle algorithm issue – doron Feb 21 '21 at 13:33
  • 1
    That's a terrible solution though. I hope that's not how you actually resolved the issue. – David Schwartz Feb 21 '21 at 13:36
  • 1
    @DavidSchwartz What do you recommend? I wrote this post intending to help others seeing the same issue, but I am certainly not an expert. If there is a better solution I would appreciate to know. – kexu Feb 21 '21 at 13:48
  • 1
    @kexu You have to find and fix the actual problem. Most likely, the problem is in the send logic failing to aggregate all the outbound data that needs to be sent at the same time. It could also be in the protocol design not allowing reply data to allow ACKs to piggy-back. But disabling Nagle is only suitable for using TCP in applications not designed to work with TCP. TCP-native applications should be designed to work with Nagle and not need it disabled. Disabling Nagle gives terrible behavior under all kinds of conditions such as network congestion. – David Schwartz Feb 21 '21 at 13:57
  • 1
    @DavidSchwartz If you want low latency and you are not alway writing, you're going to end up disabling Nagle. This is true regardless of whether you have small or large writes; for a large write you end up with a quick sequence of full packets, followed by additional latency on the last packet. In this case it seems like the exchange is a request/reply sequence, and enabling Nagle adds latency to each round trip. As for TCP congestion control, that is handled transparently to the application -- How does disabling Nagle make that worse? – janm Feb 21 '21 at 18:28
  • Thanks for the replies. I think it is true that the underlying issue was the separation of the outbound data into multiple writes, although @janm you bring up a good point about additional latency on the last packet, so perhaps disabling Nagle's is appropriate in my app (and is probably a case-by-case thing?) . Anyway I have updated the answer to hopefully make it more precise, please improve it if needed – kexu Feb 21 '21 at 18:35
  • 2
    @janm There's no "additional latency on the last packet". Since the last packet is the first non-full packet, it's sent immediately. If it's a request/reply sequence, then the ACK for the last packet will piggyback on the reply. So the next request will not be delayed. Disabling Nagle makes things worse because Nagle exists for a reason -- to improve behavior under congested network conditions. Without it, behavior under those conditions can be pathological. – David Schwartz Feb 21 '21 at 21:07
  • 3
    @kexu See my reply. There's no additional latency on the last packet -- it's sent immediately. And for a request/reply protocol, the TCP-level ACKs will piggyback on the application-level acknowledgements and there will be no additional latency anywhere. The people who designed TCP knew what they were doing and the ability to disable Nagle exists for applications that simply were not designed to work with TCP and where you can't fix the protocol to work without latency and without wasting bandwidth. Nagle is a hack when better options don't exist. – David Schwartz Feb 21 '21 at 21:08
  • 2
    Here's a way that probably fixes it without requiring big changes to the code design https://stackoverflow.com/questions/65700159/delay-latency-in-synchronous-boost-asio-with-unix-socket/65705168#65705168 /cc @DavidSchwartz – sehe Feb 21 '21 at 21:27
  • Thanks @DavidSchwartz for all your help. I really appreciate you taking the time to explain this – kexu Feb 21 '21 at 21:32
  • 2
    @kexu You're welcome. Note that if you had the issue in the question sehe linked, disabling Nagle would have made the problem worse, not better. At least Nagle would have aggregated the packets some of the time. By disabling Nagle, you *never* get any aggregation and so you waste bandwidth with packets with minimal data in them even when there is lots of data to send at that very time. – David Schwartz Feb 21 '21 at 21:36
  • @DavidSchwartz If there are no delayed acks on the receiver side and it fits the higher-level protocol, then yes, that is an approach. I believe TCP_QUICKACK is Linux specific. On FreeBSD, for example, turning off delayed acks is a system-wide setting which can make it less practical. There are different interactions here, but I can see cases where disabling Nagle is the right thing to do. It depends on the latency vs. congestion tradeoffs you're making with the higher level protocol. Sending a complete application-level message in a single write and Nagle disabled reduces latency. – janm Feb 22 '21 at 08:01
  • @janm It gets the same latency with Nagle enabled since Nagle won't do anything in that case. The first "unfull" packet is always sent without delay. If you send the entire application-level message in a single write, there will only be one "unfull" message. If the message gets a reply, the ACK will piggy-back on the reply, so the next unfull message will also be the first and also won't be delayed. – David Schwartz Feb 22 '21 at 20:47