4

I have the following async TCP server that I use to accept incoming requests from clients that want to upload files. After say about 1-4 hours, the server simply stops accepting any new connections. This has been baffling me because I am unable to reproduce the error deterministically. The rest of the threads in my program continue to function properly. There are no exceptions thrown whatsoever. Any suggestions on what could be happening?

Before the server goes dead, all I do see is that it manages to finish the pending requests and then stops. I have a hunch that the server is experiencing frequent network disconnects. How do I make this piece of code more robust to failures? I've added try-catch to both pieces of code that might fail but I feel I'm still missing something.

I setup another thread to check if this server releases the socket but it seems like the socket is in use even after it stops processing client requests. I verified this using netstat -a

internal class TCPServer
{
    private readonly int _listeningPort;
    private TcpListener listener;

    public TCPServer(int port)
    {
        _listeningPort = port;
        listener = new TcpListener(IPAddress.Any, _listeningPort);
        listener.Start(int.MaxValue);
    }

    public async void Start()
    {
        while (true)
        {
            Log.Verbose("Waiting for connections...");
            try
            {
                var tcpClient = await listener.AcceptTcpClientAsync();
                Task t = HandleConnectionAsync(tcpClient);
                await t;
            }
            catch (Exception exp)
            {
                Log.Fatal(exp.ToString());
            }
        }
    }

    private async Task HandleConnectionAsync(TcpClient tcpClient)
    {
        try
        {
            string outputFile = ""; // get a random string

            using (var stream = tcpClient.GetStream())
            using (var output = File.Create(outputFile))
            {
                //...

                while ((bytesRead = await stream.ReadAsync(buffer, 0, buffer.Length)) > 0)
                {
                    // ...
                    await output.WriteAsync(buffer, 0, bytesRead);
                }
            }

            tcpClient.Close();
        }
        catch (Exception exp)
        {
            Log(exp.Message);
        }
    }
Legend
  • 113,822
  • 119
  • 272
  • 400

2 Answers2

6

There are three problems:

  1. You process connections one by one. One faulty/hanging connection holds up the entire server. Simply remove await t;. You are already logging all errors in the worker function which is good.
  2. Make HandleConnectionAsync return very quickly so that accepting new connections can quickly resume. I'd sacrifice a tiny amount of efficiency and just say Task.Run(() => HandleConnectionAsync(tcpClient)) to be sure that processing can continue immediately. Right now all the file opening stuff is synchronous and holding up the accept workflow.
  3. Deterministically close tcpClient by wrapping it: using (tcpClient) .... That prevents zombie clients from hanging around in case of an error.
usr
  • 168,620
  • 35
  • 240
  • 369
  • 1
    +1 These are brilliant observations. In retrospect, this seems like such a silly mistake to make. Thank you so much for the clarification. The server hasn't crashed once in the last three days :) I'll continue monitoring. One final question: Am I doing the network read and file write correctly using the await operation inside the worker task? To me, it looks correct but I just want to confirm through a second pair of eyes. Also, can you add some information on why the await was causing the problem in the first place? – Legend Apr 06 '15 at 19:16
  • @Legend the read and write is correct. You can (heuristically) tell that by the fact that the await code looks just like the synchronous code but with await mechanically slapped on it. Making code async with await is easy if you don't do anything funky. I'll answer the last question with a question: What, in your mind, does `await t;` do? – usr Apr 06 '15 at 19:27
  • Thanks! My understanding is that it will make the thread wait for the `Task t` to finish so it will wait before it accepts another connection? – Legend Apr 07 '15 at 02:34
  • That is almost correct. It's just that this has nothing to do with threads since the purpose of await is to release threads and not make them wait/block. But yes, that's why it doesn't process the next connection right then. – usr Apr 07 '15 at 08:59
-1

You should check, if you are still connected.

Maybe this is useful: How to test for a broken connection of TCPClient after being connected?

Community
  • 1
  • 1
NewDeveloper
  • 120
  • 3
  • 15