11

A TCP Server is developed using SocketAsyncEventArgs and it's async methods as a Windows Service. I have these 2 line of code at the beginning of Main:

ThreadPool.SetMaxThreads(15000, 30000);
ThreadPool.SetMinThreads(10000, 20000);

And both return true (returned values are logged). Now 2000 to 3000 clients start to send messages to this server and it starts to accept connections (I count the number of connections and it is as expected - There is a connection pool). Thread count of the server process will grow to ~2050 to ~3050. So far so good!

Now there is a Received method which will be called either after ReceiveAsync returns true or by Completed event of SocketAsyncEventArgs.

And here the problems begin: Not matter how much clients are connected and how much messages they send, Received will be called at most 20 times in a second! And as the number of clients increases, this number (20) drops to ~10.

Environment: TCP Server and clients are being simulated on the same machine. I have tested the code on 2 machines, one has a 2-core CPU and 4GB RAM and the other one has a 8-core CPU and 12GB RAM. There is no data loss (yet) and sometimes I receive more than 1 message in each receive operation. That's fine. But how can the number of receive operations can be increased?

Additional notes on implementation: The code is large and has many different logics included. An overall description would be: I have a single SocketAsyncEventArgs for accepting new connections. It works great. Now for each new accepted connection I create a new SocketAsyncEventArgs for receiving data. I put this one (the SocketAsyncEventArgs created for receive) in a pool. It will not be reused but it's UserToken is being used for tracking connections; for example those connections that are disconnected or those connection that has not send any data for 7 minutes will be closed and disposed (The AcceptSocket of SocketAsyncEventArgs will be shutdown(both), closed and disposed and so will the SocketAsyncEventArgs object itself). Here is a Sudo class that performs these task but all other logic and logging and error checking and anything else is removed to make it simple and clear (maybe then it is easier to spot the problematic code):

class Sudo
{
    Socket _listener;
    int _port = 8797;

    public Sudo()
    {
        var ipEndPoint = new IPEndPoint(IPAddress.Any, _port);
        _listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
        _listener.Bind(ipEndPoint);

        _listener.Listen(100);

        Accept(null);
    }

    void Accept(SocketAsyncEventArgs acceptEventArg)
    {
        if (acceptEventArg == null)
        {
            acceptEventArg = new SocketAsyncEventArgs();
            acceptEventArg.Completed += AcceptCompleted;
        }
        else acceptEventArg.AcceptSocket = null;

        bool willRaiseEvent = _listener.AcceptAsync(acceptEventArg); ;

        if (!willRaiseEvent) Accepted(acceptEventArg);
    }

    void AcceptCompleted(object sender, SocketAsyncEventArgs e)
    {
        Accepted(e);
    }

    void Accepted(SocketAsyncEventArgs e)
    {
        var acceptSocket = e.AcceptSocket;
        var readEventArgs = CreateArg(acceptSocket);

        var willRaiseEvent = acceptSocket.ReceiveAsync(readEventArgs);

        Accept(e);

        if (!willRaiseEvent) Received(readEventArgs);
    }

    SocketAsyncEventArgs CreateArg(Socket acceptSocket)
    {
        var arg = new SocketAsyncEventArgs();
        arg.Completed += IOCompleted;

        var buffer = new byte[64 * 1024];
        arg.SetBuffer(buffer, 0, buffer.Length);

        arg.AcceptSocket = acceptSocket;

        arg.SocketFlags = SocketFlags.None;

        return arg;
    }

    void IOCompleted(object sender, SocketAsyncEventArgs e)
    {
        switch (e.LastOperation)
        {
            case SocketAsyncOperation.Receive:
                Received(e);
                break;
            default: break;
        }
    }

    void Received(SocketAsyncEventArgs e)
    {
        if (e.SocketError != SocketError.Success || e.BytesTransferred == 0 || e.Buffer == null || e.Buffer.Length == 0)
        {
            // Kill(e);
            return;
        }

        var bytesList = new List<byte>();
        for (var i = 0; i < e.BytesTransferred; i++) bytesList.Add(e.Buffer[i]);

        var bytes = bytesList.ToArray();

        Process(bytes);

        ReceiveRest(e);

        Perf.IncOp();
    }

    void ReceiveRest(SocketAsyncEventArgs e)
    {
        e.SocketFlags = SocketFlags.None;
        for (int i = 0; i < e.Buffer.Length; i++) e.Buffer[i] = 0;
        e.SetBuffer(0, e.Buffer.Length);

        var willRaiseEvent = e.AcceptSocket.ReceiveAsync(e);
        if (!willRaiseEvent) Received(e);
    }

    void Process(byte[] bytes) { }
}
Kaveh Shahbazian
  • 13,088
  • 13
  • 80
  • 139
  • MB RAM? Is that I-cache? I hope you have several GB of memory. Have you used Resource Monitor or Process Explorer to look for bottlenecks? – HABO Dec 26 '12 at 19:17
  • Hmm ... What is the asynchronous architecture on your server? Could you post some pseudo code maybe to describe how the listen happens and how connections get accepted, etc? – Thilak Nathen Dec 26 '12 at 19:19
  • @ThilakNathen I have added a description. I could not factor out the code very well because there were a lot of different classes and structures involved but (as far as I understand) I just did the very basic architecture with SocketAsyncEventArgs async methods. – Kaveh Shahbazian Dec 26 '12 at 19:31
  • @ThilakNathen I have succeeded to factor out the server (from my own code! Maybe I am just too tired! ;). Please be kind and comment. – Kaveh Shahbazian Dec 26 '12 at 19:49
  • I don't see anything immediately wrong with the code you posted. Copying the buffer like that will be slow, but I don't think that would be the cause. Also, I assume `Kill(e)` disposes of the `SocketAsyncEventArgs`? – Cory Nelson Dec 26 '12 at 20:05

1 Answers1

6

The reason it's slowing down is because each one of those threads needs to context switch, and that's a relatively expensive operation. The more threads you add, the larger percentage of your CPU is spent simply on context switching and not in your actual code.

You've hit this one-thread-per-client bottleneck in a rather odd way. The whole point of server-side async is to reduce the number of threads -- to not have one thread per client, but ideally only one or two for each of the logical processors in your system.

The async code you posted looks fine, so I can only guess your Process method has some easy to overlook non-async, blocking I/O in it, i.e. a database or file access. When I/O blocks, the .NET thread pool detects this and automatically spins up a new thread -- it's basically spiraling out of control here with the I/O in Process as a bottleneck.

An async pipeline really needs to be 100% async to get any significant benefit from it. Half-in will have you writing complex code that performs just as poorly as simple sync code.

If you absolutely can't make the Process method purely async, you might have some luck faking it. Have things wait in a queue for processing by a small, limited-size thread pool.

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • In this Windows Service I do some processing on received data in memory and the only place that I am speaking to out of my program is a method that puts the processed data in a MSMQ queue. – Kaveh Shahbazian Dec 26 '12 at 19:51
  • What you have pointed out about databases and files (and in my case MSMQ). You were right. I forgot about the damn MSMQ with recoverable option in my other projects (It has an awful performance compared to RabbitMQ for example but here I am forced to use it - any comments on MSMQ performance is welcome too). So please edit your answer to properly pointing out the problem caused by other activities out of main async workflow; so I can mark it as the answer. Thanks again! – Kaveh Shahbazian Dec 26 '12 at 21:24
  • Awesome, glad you found an answer. Yea, MSMQ really does suck when recoverable mode is turned on :). – Cory Nelson Dec 26 '12 at 21:40