I'm writing an application where two applications (say server and client) are communicating via a TCP-based connection on localhost.
The code is fairly performance critical, so I'm trying to optimize as best as possible.
The code below is from the server application. To send messages, my naive approach was to create a BinaryWriter from the TcpClient's stream, and write each value of the message via the BinaryWriter. So let's say the message consists of 4 values; a long, followed by a bolean value, and then 2 more longs; the naive approach was:
TcpClient client = ...;
var writer = new BinaryWriter(client.GetStream());
// The following takes ca. 0.55ms:
writer.Write((long)123);
writer.Write(true);
writer.Write((long)456);
writer.Write((long)2);
With 0.55ms execution time, this strikes me as fairly slow. Then, I've tried the following instead:
TcpClient client = ...;
// The following takes ca. 0.15ms:
var b1 = BitConverter.GetBytes((long)123);
var b2 = BitConverter.GetBytes(true);
var b3 = BitConverter.GetBytes((long)456);
var b4 = BitConverter.GetBytes((long)2);
var result = new byte[b1.Length + b2.Length + b3.Length + b4.Length];
Array.Copy(b1, 0, result, 0, b1.Length);
Array.Copy(b2, 0, result, b1.Length, b2.Length);
Array.Copy(b3, 0, result, b1.Length + b2.Length, b3.Length);
Array.Copy(b4, 0, result, b1.Length + b2.Length + b3.Length, b4.Length);
client.GetStream().Write(result, 0, result.Length);
The latter runs in ca 0.15ms, while the first approach took roughly 0.55ms, so 3-4 times slower.
I'm wondering ... why? And more importantly, what would be the best way to write messages as fast as possible (while maintaining at least a minimum of code readability)?
The only way I could think of right now is to create a custom class similar to BinaryWriter; but instead of writing each value directly to the stream, it would buffer a certain amount of data (say 10,000 bytes or such) and only send it to the stream when its internal buffer is full, or when some .Flush() method is explicitly called (e.g. when message is done being written).
This should work, but I wonder if I'm overcomplicating things and there's an even simpler way to achieve good performance? And if this was indeed the best way - any suggestions how big the internal buffer should ideally be? Does it make sense to align this with Winsock's send and receive buffers, or best to make it as big as possible (or rather as big as sensible given memory constraints)?
Thanks!