TcpClient performance - sending 4 scalar values much slower than sending 1 byte array containing all values

Question

I'm writing an application where two applications (say server and client) are communicating via a TCP-based connection on localhost.

The code is fairly performance critical, so I'm trying to optimize as best as possible.

The code below is from the server application. To send messages, my naive approach was to create a BinaryWriter from the TcpClient's stream, and write each value of the message via the BinaryWriter. So let's say the message consists of 4 values; a long, followed by a bolean value, and then 2 more longs; the naive approach was:

TcpClient client = ...;
var writer = new BinaryWriter(client.GetStream());

// The following takes ca. 0.55ms:

writer.Write((long)123);
writer.Write(true);
writer.Write((long)456);
writer.Write((long)2);

With 0.55ms execution time, this strikes me as fairly slow. Then, I've tried the following instead:

TcpClient client = ...;

 // The following takes ca. 0.15ms:

var b1 = BitConverter.GetBytes((long)123);
var b2 = BitConverter.GetBytes(true);
var b3 = BitConverter.GetBytes((long)456);
var b4 = BitConverter.GetBytes((long)2);

var result = new byte[b1.Length + b2.Length + b3.Length + b4.Length];
Array.Copy(b1, 0, result, 0, b1.Length);
Array.Copy(b2, 0, result, b1.Length, b2.Length);
Array.Copy(b3, 0, result, b1.Length + b2.Length, b3.Length);
Array.Copy(b4, 0, result, b1.Length + b2.Length + b3.Length, b4.Length);

client.GetStream().Write(result, 0, result.Length);

The latter runs in ca 0.15ms, while the first approach took roughly 0.55ms, so 3-4 times slower.

I'm wondering ... why? And more importantly, what would be the best way to write messages as fast as possible (while maintaining at least a minimum of code readability)?

The only way I could think of right now is to create a custom class similar to BinaryWriter; but instead of writing each value directly to the stream, it would buffer a certain amount of data (say 10,000 bytes or such) and only send it to the stream when its internal buffer is full, or when some .Flush() method is explicitly called (e.g. when message is done being written).

This should work, but I wonder if I'm overcomplicating things and there's an even simpler way to achieve good performance? And if this was indeed the best way - any suggestions how big the internal buffer should ideally be? Does it make sense to align this with Winsock's send and receive buffers, or best to make it as big as possible (or rather as big as sensible given memory constraints)?

Thanks!

And how do you measure that 0.15ms? Is that time to execute `Write` statements or actual time to deliver to client? — Evk, Mar 27 '18 at 12:16
Hi, I've measured both times directly on the server, e.g. basically a stopwatch from where the respective comment is until right after the last statement there Edit: In addition (!) I've also measured execution time on the client - i.e. everything from "request an operation from the server" til "get response", performance difference was very much in line with the above results as well — Bogey, Mar 27 '18 at 12:23
Tried to reproduce out of curiosity, but wasn't able to, both approaches take 0.05-0.010ms for me. Anyway, if you want to continue using `BinaryWriter` - just write to `MemoryStream` instead. Then get buffer via `memoryStream.ToArray()` and send that. If total size of message is known beforehand, initialize that stream with expected capacity (`new MemoryStream(100)`) and then use `GetBuffer()` instead of `ToArray` to avoid unnecessary copy. — Evk, Mar 27 '18 at 12:40
You were not? Huh, that's odd. Wonder if any kind of corporate fire wall/AV software could be interfering on my end? But via localhost? Hmm.. Anyways - thanks, great suggestion with a MemoryStream in between — Bogey, Mar 27 '18 at 12:45
Try to warm-up your application by executing both versions before actual measurement. See [Warm-up when calling methods in C#](https://stackoverflow.com/questions/4446203/warm-up-when-calling-methods-in-c-sharp) discussion on StackOverflow. — Leonid Vasilev, Mar 27 '18 at 12:52
Thanks @LeonidVasilyev, I had done that in each case actually — Bogey, Mar 27 '18 at 12:54
What if you exclude `BinaryWriter` from first version and use `BitConverter` as in second version: `var b1 = BitConverter.GetBytes((long)123); client.GetStream().Write(b1, 0, b1.Length);`? — Leonid Vasilev, Mar 27 '18 at 13:36
@LeonidVasilyev Have just tried: performance of this is effectively the same as with BinaryWriter, so around ca 0.55ms — Bogey, Mar 27 '18 at 13:43
Might be related to issues caused by combination of Nagle's Algorithm and TCP delayed acknowledgment, but your scenario doesn't quite match because you only write to socket. See [Faster way to communicate using TcpClient?](https://stackoverflow.com/questions/6127419/faster-way-to-communicate-using-tcpclient) discussion on StackOverflow, John Nagle's comment in [The trouble with the Nagle algorithm](https://developers.slashdot.org/comments.pl?sid=174457&threshold=1&commentsort=0&mode=thread&cid=14515105) discussion. — Leonid Vasilev, Mar 27 '18 at 14:28
You can also try to set `client.NoDelay` to `false` in the first version, but probably the right thing to do is to debug your case with a [Wireshark](https://www.wireshark.org/). See [Nagle’s Algorithm is Not Friendly towards Small Requests](https://blogs.msdn.microsoft.com/windowsazurestorage/2010/06/25/nagles-algorithm-is-not-friendly-towards-small-requests/) article by Windows Azure Storage Team. — Leonid Vasilev, Mar 27 '18 at 14:32
That was - and is - another confusing point to me. I had explicitly tried both, i.e. setting NoDelay to true or to false, but that didn't change anything at all, I don't think any data ever got delayed — Bogey, Mar 27 '18 at 14:44

babu646 · Answer 1 · 2018-03-27T12:22:20.400

0

The first code does four blocking network-IO operations, while the second one does only one. Usually, most types of IO operations incur in quite heavy overhead, so you would presumably want to avoid small writes/reads and batch things up.

You should always serialize your data, and if posible, batch it into a single message. This way you would avoid as much IO overhead as possible.

edited Mar 27 '18 at 12:22

answered Mar 27 '18 at 12:16

babu646

989
10
22

Thanks - is there any guidance how big messages should be? In practice I might need to transfer data that could in the worst case be several hundreds of MB (not likely but who knows) - other than memory consumption, is there any inherent issue in sending this in one batch, i.e. one call to client.GetStream().Write ? – Bogey Mar 27 '18 at 12:23
If you use the .Net TcpClient you should be fine, as it splits the data automatically for you. Max TCP packet size is 64k anyways. – babu646 Mar 27 '18 at 12:28
That's good to know; so if buffering myself, I suppose 64k (64*1024 I suppose?) would be a good limit for the internal buffer – Bogey Mar 27 '18 at 12:35
I can't tell for sure. You should probably test it with 64k and larger sizes to see what hits the sweet spot. But at that point your performance would be limited by your actual bandwidth and network latency. You could remedy this by using asynchronous writes (BeginWrite()). – babu646 Mar 27 '18 at 12:50

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

Probably the question is more about Interprocess Communication (IPC) rather than TCP protocol. There are multiple options to use for IPC (see Interprocess Communications page on Microsoft Dev Center). First you need to define your system requirements (how the system should perform/scale), than you need to choose a simplest option that works best in your particular scenario using performance metrics.

Relevant excerpt from Performance Culture article by Joe Duffy:

Decent engineers intuit. Good engineers measure. Great engineers do both.

Measure what, though?

I put metrics into two distinct categories:

Consumption metrics. These directly measure the resources consumed by running a test.

Observational metrics. These measure the outcome of running a test, observationally, using metrics “outside” of the system.

Examples of consumption metrics are hardware performance counters, such as instructions retired, data cache misses, instruction cache misses, TLB misses, and/or context switches. Software performance counters are also good candidates, like number of I/Os, memory allocated (and collected), interrupts, and/or number of syscalls. Examples of observational metrics include elapsed time and cost of running the test as billed by your cloud provider. Both are clearly important for different reasons.

As for TCP, I don't see the point of writing data in small pieces when you can write it at once. You can use BufferedStream to decorate TCP client stream instance and use same BinaryWriter with it. Just make sure you don't mix reads and writes in a way that forces BufferedStream to try to write internal buffer back to the stream, because that operation is not supported in NetworkStream. See Is it better to send 1 large chunk or lots of small ones when using TCP? and Why would BufferedStream.Write throw “This stream does not support seek operations”? discussions on StackOverflow.

For more information check Example of Named Pipes, C# Sockets vs Pipes, IPC Mechanisms in C# - Usage and Best Practices, When to use .NET BufferedStream class? and When is optimisation premature? discussions on StackOverflow.

TcpClient performance - sending 4 scalar values much slower than sending 1 byte array containing all values

2 Answers2