5

Is there a valid reason to not use TcpListener for implementing a high performance/high throughput TCP server instead of SocketAsyncEventArgs?

I've already implemented this high performance/high throughput TCP server using SocketAsyncEventArgs went through all sort of headaches to handling those pinned buffers using a big pre-allocated byte array and pools of SocketAsyncEventArgs for accepting and receiving, putting together using some low level stuff and shiny smart code with some TPL Data Flow and some Rx and it works perfectly; almost text book in this endeavor - actually I've learnt more than 80% of these stuff from other-one's code.

However there are some problems and concerns:

  1. Complexity: I can not delegate any sort of modifications to this server to another member of the team. That bounds me to this sort of tasks and I can not pay enough attention to other parts of other projects.
  2. Memory Usage (pinned byte arrays): Using SocketAsyncEventArgs the pools are needed to be pre-allocated. So for handling 100000 concurrent connections (worse condition, even on different ports) a big pile of RAM is uselessly hovers there; pre-allocated (even if these conditions are met just at some times, server should be able to handle 1 or 2 such peaks everyday).
  3. TcpListener actually works good: I actually had put TcpListener into test (with some tricks like using AcceptTcpClient on a dedicated thread, and not the async version and then sending the accepted connections to a ConcurrentQueue and not creating Tasks in-place and the like) and with latest version of .NET, it worked very well, almost as good as SocketAsyncEventArgs, no data-loss and a low memory foot-print which helps with not wasting too much RAM on server and no pre-allocation is needed.

So why I do not see TcpListener being used anywhere and everybody (including myself) is using SocketAsyncEventArgs? Am I missing something?

Kaveh Shahbazian
  • 13,088
  • 13
  • 80
  • 139
  • @usr Thanks, you are right and as I've mentioned at point `3.` I'm doing exactly as what you say! Accepting is happening in a simple loop, on a dedicated `Thread` (`Task` created with `TaskCreationOptions.LongRunning` option). – Kaveh Shahbazian Jan 03 '15 at 13:35
  • 1
    Then I don't understand why you are suggesting a choice between SocketAsyncEventArgs and TcpListener. Why not use both? The listener is out of play as soon as a connection has been accepted. In my mind it has nothing to do with processing the connection. – usr Jan 03 '15 at 13:36
  • `SocketAsyncEventArgs` causes a buffer of `byte` array get pinned, so can not be garbage collected effectively and causes memory fragmentation which leads to higher CPU and RAM usage; plus all extra hurdles one should go through to prepare and manage pools of `SocketAsyncEventArgs` objects; all in all it need much more extra work and maintenance and more importantly this knowledge/experience is not easily/safely/reliably transferable to another developer. – Kaveh Shahbazian Jan 03 '15 at 13:40
  • I need clarification on this: Is this question about TcpListener at all? In what way? It seems you are asking: "Can I simply replace SocketAsyncEventArgs with the usual APM/TAP async IO?" This would not have to do anything with TcpListener at all. – usr Jan 03 '15 at 13:42
  • As I've described `TcpListener` provides a much simpler programming model plus the same performance charactristics of `SocketAsyncEventArgs`. But in all projects I've studied (like fracture (F#), SocketAwaitable, SuperSocket and many other samples and blog posts here & there) I can not find anybody using `TcpListener`. Now I ask, Why? – Kaveh Shahbazian Jan 03 '15 at 13:46
  • There's no answers here (but a dupe nonetheless), but for reference; http://stackoverflow.com/questions/21656077/socketasynceventargs-vs-tcplistener-tcpclient – Patrick Jan 03 '15 at 13:51
  • Because people don't know what they're doing when it comes to socket code. The state of sample code and practices is abysmal. – usr Jan 03 '15 at 13:52
  • @Patrick I am not defending this question but that question just asks which one to use. I've actually used both, stress tested both and compared them and put forward some concerns; but if moderators decided to close this, I just accept their judgement. – Kaveh Shahbazian Jan 03 '15 at 14:00
  • I wasn't suggesting closing this as a duplicate, I was just posting it for reference.. take it easy – Patrick Jan 03 '15 at 14:07
  • @KavehShahbazian Why are you using a dedicated thread to accept? What is the advantage over using a Task? – uriDium Nov 05 '21 at 11:49
  • @uriDium If I recall correctly, it helps with making sure all incoming connection requests succeed. In that specific TCP server, we could not afford to drop connections. I do not remember if it helped in practice or not. At the time, the TCP server was working and I think this question was more of an exploration on cons of the alternatives (I have re-implemented it in F# (using some lib called helio I think) then Elixir, then Go. Elixir version could handle the most no of concurrent connections. I think Go version is currently being used there (no longer part of that team)). – Kaveh Shahbazian Nov 06 '21 at 13:53

1 Answers1

3

I see no evidence that this question is about TcpListener at all. It seems you are only concerned with the code that deals with a connection that already has been accepted. Such a connection is independent of the listener.

SocketAsyncEventArgs is a CPU-load optimization. I'm convinced you can achieve a higher rate of operations per second with it. How significant is the difference to normal APM/TAP async IO? Certainly less than an order of magnitude. Probably between 1.2x and 3x. Last time I benchmarked loopback TCP transaction rate I found that the kernel took about half of the CPU usage. That means your app can get at most 2x faster by being infinitely optimized.

Remember that SocketAsyncEventArgs was added to the BCL in the year 2000 or so when CPUs were far less capable.

Use SocketAsyncEventArgs only when you have evidence that you need it. It causes you to be far less productive. More potential for bugs.

Here's the template that your socket processing loop should look like:

while (ConnectionEstablished()) {
 var someData = await ReadFromSocketAsync(socket);
 await ProcessDataAsync(someData);
}

Very simple code. No callbacks thanks to await.


In case you are concerned about managed heap fragmentation: Allocate a new byte[1024 * 1024] on startup. When you want to read from a socket read a single byte into some free portion of this buffer. When that single-byte read completes you ask how many bytes are actually there (Socket.Available) and synchronously pull the rest. That way you only pin a single rather small buffer and still can use async IO to wait for data to arrive.

This technique does not require polling. Since Socket.Available can only increase without reading from the socket we do not risk to perform a read that is too small accidentally.

Alternatively, you can combat managed heap fragmentation by allocating few very big buffers and handing out chunks.

Or, if you don't find this to be a problem you don't need to do anything.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 1
    I was with you 100% until you got to the suggestion to use `DataAvailable`. I don't recommend that at all; `DataAvailable` has a classic race issue, in that it theoretically can become `false` between the time you check it and the time you actually try to read (as long as the client never sees an error, TCP is allowed to discard data and force a resend, however unlikely that might be). IMHO, it's much better to just always read as much data from a socket as is practical, and handle any partitioning logic downstream from that, rather than trying to use the socket itself for that. – Peter Duniho Jan 03 '15 at 17:05
  • @PeterDuniho DataAvailable cannot become 0 (it is an int - it is misnamed) if you don't read and it was non-zero. TCP resend stuff is not applicable here. The only reason I recommend DataAvailable here is so that you don't have to "lock up" an entire buffer of, say, 4KB for 100k pending read operations. Rather, read one byte (lock up a tiny buffer) and then drain the rest very quickly once you know that data is there. If you don't like you can simply not use DataAvailable here. But you risk that really one one byte was received and the 2nd read now blocks... Maybe that's just unrealistic. – usr Jan 03 '15 at 17:10
  • Dude; my English sucks :( – Kaveh Shahbazian Jan 03 '15 at 19:15
  • @KavehShahbazian if you've got follow-up questions I'm happy to take them. – usr Jan 03 '15 at 19:20
  • Thanks @usr I think I should ask in a code-driven manner. I'll see to it. Thanks; – Kaveh Shahbazian Jan 03 '15 at 19:24
  • 1
    @usr: `DataAvailable` is a bool property on the `NetworkStream` class and is just a wrapper for the `Socket.Available` property, which is indeed an `int`. But please note from the docs: "The available data is the total amount of data queued in the _network buffer_ for reading". The network layer is not required to keep data in its buffers unless it's already acknowledged receipt. I agree the most likely implementation will always acknowledge receipt as soon as data is buffered, but this isn't mandatory and I've seen too many platform differences to count on that assumption. YMMV. – Peter Duniho Jan 03 '15 at 22:11
  • Ok, I meant `Socket.Available`. Who knows what the right property is given these ambiguous names?! Anyway, I' pretty sure this is the data that is stably available to read. Why would it be any other way? The local side has no idea what's coming. It can only tell you what's already there. That's Socket.Available. (In case you're not convinced treat my answer as if that `Socket.Available` trick was not there. It is not essential to any aspect of this answer.) – usr Jan 03 '15 at 22:12
  • @PeterDuniho I did not make this clear before: I believe that Available tells you the number of bytes that are guaranteed to be available without blocking and I believe that is the purpose of this property. I think the trick in this answer is pretty much the only use case of that property. 99% of the time it is being abused. – usr Jan 03 '15 at 22:25
  • 1
    @usr: let's assume for the sake of argument the network layer never discards data. `Available` (wrapper for unmanaged `ioctlsocket(FIONREAD)`) is still problematic, as it requires polling the socket, and necessarily uses the socket inefficiently by failing to read as much data as possible. See e.g. http://microsoft.public.platformsdk.networking.narkive.com/jXh04uo8/select-ioctlsocket-problem#post4. It's also solving a non-issue; I've never seen any issues with the pinned buffers, and if that were an issue, one could just allocate a buffer large enough for the LOH and dole out parts as needed. – Peter Duniho Jan 04 '15 at 05:09
  • 1
    @usr: did some digging for resources, [more info in this SO answer (including the same LOH suggestion I mentioned)](http://stackoverflow.com/a/908766/3538012) and [an Internet Archive copy of Mullins' article (sadly no longer online)](http://web.archive.org/web/20080209233229/http://www.coversant.net/Coversant/Blogs/tabid/88/EntryID/10/Default.aspx). Note that one thing Mullins doesn't mention having to worry about is pinned buffers. And PC horsepower is even greater now than when that article was written. Even Jerry's answer, suggesting the same "read 1 byte first", doesn't use `Available`. – Peter Duniho Jan 04 '15 at 05:28
  • @PeterDuniho I don't think we have a common understanding of what I'm proposing because two of your points do not apply: The number that Available returns can never decrease without reading (do you believe otherwise? Why?). This gives us a minimum value to read. We do not require to read everything. If you don't like this just use a fixed-size buffer like 4KB. Does not really matter. But both variants are 100% correct. Polling is not required (why would it? The 1 byte read waits.) – usr Jan 04 '15 at 09:21
  • 1
    Heap fragmentation due to pinning is a well-known (I think) problem. https://www.google.com/webhp?complete=1&hl=en&q=.net+heap+fragmentation+pinning They tried to address this many times with new GC features. Yes, there are other solutions such as using small huge buffers. Those are valid as well. – usr Jan 04 '15 at 09:22