97

Let's say I'm running a simple server and have accept()ed a connection from a client.

What is the best way to tell when the client has disconnected? Normally, a client is supposed to send a close command, but what if it disconnects manually or loses network connection altogether? How can the server detect or handle this?

Zxaos
  • 7,791
  • 12
  • 47
  • 61
  • Look here (for the worst case scenarios): http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html (Checking for dead peers) – Blauohr Nov 12 '08 at 09:05
  • 4
    Because there are so many wrong and misleading answers, here's the right one: Follow the specification for the protocol you are implementing on top of TCP. It should specify whether this is done by timeouts, write failures, or some other mechanism. If you are designing a protocol, make sure to design some way to detect client disconnection, if that is required. – David Schwartz Jul 19 '16 at 06:58

10 Answers10

153

In TCP there is only one way to detect an orderly disconnect, and that is by getting zero as a return value from read()/recv()/recvXXX() when reading.

There is also only one reliable way to detect a broken connection: by writing to it. After enough writes to a broken connection, TCP will have done enough retries and timeouts to know that it's broken and will eventually cause write()/send()/sendXXX() to return -1 with an errno/WSAGetLastError() value of ECONNRESET, or in some cases 'connection timed out'. Note that the latter is different from 'connect timeout', which can occur in the connect phase.

You should also set a reasonable read timeout, and drop connections that fail it.

The answer here about ioctl() and FIONREAD is compete nonsense. All that does is tell you how many bytes are presently in the socket receive buffer, available to be read without blocking. If a client doesn't send you anything for five minutes that doesn't constitute a disconnect, but it does cause FIONREAD to be zero. Not the same thing: not even close.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Should note that reset can also happen under certain normal conditions and doesn't necessarily mean the other party is no longer listening. Should also note that just because you time out receiving doesn't mean the other party isn't listening anymore either. – Jay Mar 18 '15 at 20:23
  • 3
    @Jay The question is about how to detect TCP disconnects, not about what causes connection resets. There are many causes of 'connection reset', and I don't agree that any of them constitutes 'normal operation'. It is an abnormal condition by definition. – user207421 Mar 18 '15 at 20:37
  • Only if you consider walking out of wifi service abnormal... or excess signal noise, data collisions or otherwise normal example. – Jay Mar 18 '15 at 21:04
  • @Jay I don't know what you're talking about now. I repeat. Anything that causes a connection reset is abnormal by definition, and it *does* mean in effect that the other party isn't listening, because there is nothing to listen to. And I note that I had already stated exactly what you said I should state about timeouts. You seem to be just pointlessly nitpicking. – user207421 Apr 01 '15 at 23:19
  • @Jay I suggest you research the difference betwen 'abnormal', which is what I said, and 'normal', which is how you have just misquoted me. – user207421 May 27 '15 at 20:24
  • Is it really necessary to keep calling write() until you get an error? In my experience, it seemed like a single failed write() would cause select() to signal a read ready, and that read() would return a connection reset error. – user1055568 Jul 07 '15 at 03:28
  • 2
    @user1055568 A single write usually just gets buffered and sent over the network asynchronously, unless it is very large. You need to issue enough writes so that all the internal timers and retries have been exhausted on the original write for an error to be detected. – user207421 Sep 14 '15 at 21:26
  • @EJP, yes, but eventually the retries on that single write will be exhausted, and then TCP will want to signal the application that the connection is closed. If you are waiting on "read ready" with select/epoll/kevent it will signal that, so you can pick up the error on the read(). At least that is how systems I am familiar with behave. – user1055568 Sep 16 '15 at 04:49
  • @user1055568 The point is that it won't, *can't*, get it on the `send()` that caused the error. – user207421 Sep 16 '15 at 10:31
  • 1
    @EJP, but that is far cry from saying you need to keep calling send() until you get an error. – user1055568 Sep 16 '15 at 15:30
  • @user1055568 It certainly is. I don't know why you suggest it. I have only said that you won't get an error on send until you do so. – user207421 Dec 09 '15 at 04:20
  • @EJP You said "You need to issue enough writes so that all the internal timers and retries have been exhausted." I read that as "you" the application, calling TCP API. Now I realize you meant it as "you" the system, running TCP protocol. You also say you can only pick up this error on a subsequent send(), but that is false, as I pointed to in my original comment. – user1055568 Dec 10 '15 at 16:45
  • 2
    If the application doesn't keep issuing writes, there's no guarantee that it will have issued any writes after the connection broke. While one write issued after the connections fails is sufficient, the connection can fail at any time, and if you ever stop writing indefinitely, you have no way to know you issued even one write after the connection failed. – David Schwartz Dec 21 '15 at 19:29
  • @user1055568 I don't mean that at all. I meant what I wrote. *The application* needs to have issued enough writes so that TCP has accumulated an error condition on the connection. Your original comment remains incorrect, for the same reason. – user207421 Jan 07 '16 at 10:33
  • @EJP, then you are wrong on the implementations I am familiar with. A single application level send() is sufficient for TCP to detect the connection is broken. After a sufficient number of internal retries, it will signal the failure with a "read ready" condition on select() and a subsequent error on the socket recv() or by directly returning an error on epoll(). – user1055568 Jan 08 '16 at 16:13
  • 1
    @user1055568 A single application send is enough for TCP to eventually detect the error, but insufficient for he application to be given the error *on that send*, because the send is asynchronous. The application has to do something else with the socket, after the error has been detected by TCP, so that it can be given the error: either another send or another receive. I've already said all this, multiple times. – user207421 Jan 24 '16 at 06:09
  • 3
    @EJP And I have said multiple times that if the app is waiting on select/epoll/kevent for read ready, then it will be alerted to do a read to pick up the error. You have disputed this, repeatedly insisting that it must do more writes. You have said nothing about reads, and with epoll, in fact, there is no need for a read or a write as the epoll can signal the timeout directly. Probably kevent too. – user1055568 Jan 24 '16 at 17:50
  • 2
    @user1055568 If you only do reads, you aren't doing anything to the network, so you aren't going to encounter any error conditions unless the peer is obliging enough to do a reset. If you write, you are doing things to the network, so you are guaranteed, eventually, to encounter an error condition if there is one. – user207421 Jul 12 '16 at 10:11
  • @EJP True, I only disputed the claim that you must do multiple writes. A single write will eventually generate an internal time out error, which can be signaled to the app layer via the read interface. – user1055568 Jul 13 '16 at 16:27
  • 1
    @user1055568 You must do multiple *somethings*, otherwise you aren't *exercising* the user interface, so there is no opportunity to receive the error. Doing multiple reads is not sufficient for the reason I stated. Doing multiple writes is suffiicient. Doing a write and then multiple reads is also sufficient, and I have stated nothing to the contrary.. – user207421 Sep 27 '17 at 10:58
  • 2
    @EJP You are talking nonsense. Doing a single read after the single write has timed out internally is sufficient to pick up the error. If you are waiting on I/O events with select/epoll/kqueue you will be alerted when this occurs. – user1055568 Sep 28 '17 at 15:40
  • https://man7.org/linux/man-pages/man2/recv.2.html#RETURN_VALUE return value of 0 from recv() will not always indicate orderly disconnect: if the user requests 0 bytes to receive, return value of 0 won't indicate disconnect here. – avernus Aug 26 '21 at 20:36
  • @user1055568 You are wrong. You must *never* stop writing if you need to detect a connection failure. If you *ever* stop writing, you will have no way to ever detect a connection failure that occurs after that write completed. So the only reliable way to ensure you detect a connection failure is to keep performing writes. – David Schwartz Jun 23 '23 at 10:03
  • @DavidSchwartz Obviously, writing a single packet will not detect a future connection failure. But, OP suggested multiple writes are necessary to detect a currently broken connection, by triggering "enough retries and timeouts". 1 write will do that if the connection is broken. – user1055568 Jun 24 '23 at 01:24
  • @user1055568 Maybe OP somehow suggested that to you, but OP didn't say that. And the actual text the OP wrote doesn't suggest that to me. It correctly warns that you may have several writes succeed after a connection has broken. And it is a fact that if you need to eventually detect any connection failure that occurs, you cannot ever stop writing indefinitely. – David Schwartz Jun 24 '23 at 02:33
  • @DavidSchwartz He said "after enough writes to a broken connection". It takes only 1 write to a broken connection for TCP to determine it is broken, and then the app will be signaled by the next read, write or select/epoll/kqueue. – user1055568 Jun 26 '23 at 20:12
  • @user1055568 You are assuming that by "writes" he means writes at application level, but he could just as well be referring to writes at network level. It takes several of those. – David Schwartz Jul 04 '23 at 11:36
16

To expand on this a bit more:

If you are running a server you either need to use TCP_KEEPALIVE to monitor the client connections, or do something similar yourself, or have knowledge about the data/protocol that you are running over the connection.

Basically, if the connection gets killed (i.e. not properly closed) then the server won't notice until it tries to write something to the client, which is what the keepalive achieves for you. Alternatively, if you know the protocol better, you could just disconnect on an inactivity timeout anyway.

Peter Jeffery
  • 425
  • 3
  • 5
  • The server should also set a reasonable read timeout and drop connections that fail it. – user207421 Jul 15 '13 at 22:44
  • Drop the connection that fails it? What if the Timeout is as per the default recommended of 200 msec? Shouldn't it back-off to a certain reasonable timeout? Maybe that will cause too much Context Switching for you? Still dropping a connection when the such `Timeout` is so low is not sound advice... – Jay Apr 02 '15 at 02:03
  • on Winsock2, keepalive is polling every 5 seconds and I have some blocking send or recv call, then will keepalive work properly? Also what are min limits for keepalive timeout and interval? – Anurag Daware Sep 01 '15 at 13:58
  • 1
    @EJP, What OS is that? The default read timeout for most OS was 0.5 - 5 seconds last I checked... rfc for tcp specifically says tcp has a 0.2 second default.... – Jay Jan 07 '16 at 15:53
  • @Jay I don't know what you're talking about. The default value for SO_RCVTIMEO is infinite, on all operating systems. Otherwise everybody would get read timeouts all the time. Your suggestions of 200ms etc are preposterous. – user207421 Sep 05 '16 at 13:04
  • @Jay And you are confusing internal TCP timers with read timeouts. They aren't the same thing. No RFC dictates the TCP/IP Sockets API. – user207421 Aug 30 '23 at 22:31
2

If you're using overlapped (i.e. asynchronous) I/O with completion routines or completion ports, you will be notified immediately (assuming you have an outstanding read) when the client side closes the connection.

Graeme Perrow
  • 56,086
  • 21
  • 82
  • 121
  • Not quite. You will be notified immediately you read to end of stream. It could take a finite time if there was significant data in flight from the client before the close. – user207421 Jan 31 '14 at 12:02
1

Try looking for EPOLLHUP or EPOLLERR. How do I check client connection is still alive

Reading and looking for 0 will work in some cases, but not all.

Community
  • 1
  • 1
Trade-Ideas Philip
  • 1,067
  • 12
  • 21
0

TCP has "open" and a "close" procedures in the protocol. Once "opened", a connection is held until "closed". But there are lots of things that can stop the data flow abnormally. That being said, the techniques to determine if it is possible to use a link are highly dependent on the layers of software between the protocol and the application program. The ones mentioned above focus on a programmer attempting to use a socket in a non-invasive way (read or write 0 bytes) are perhaps the most common. Some layers in libraries will supply the "polling" for a programmer. For example Win32 asych (delayed) calls can Start a read that will return with no errors and 0 bytes to signal a socket that cannot be read any more (presumably a TCP FIN procedure). Other environments might use "events" as defined in their wrapping layers. There is no single answer to this question. The mechanism to detect when a socket cannot be used and should be closed depends on the wrappers supplied in the libraries. It is also worthy to note that sockets themselves can be reused by layers below an application library so it is wise to figure out how your environment deals with the Berkley Sockets interface.

jlpayton
  • 19
  • 2
0

I had a similar issue where my server would just blindly send data after a connection has been made, but then had difficulties detecting if the other side was still listening. I used the TCP_USER_TIMEOUT option: https://man7.org/linux/man-pages/man7/tcp.7.html

to set this option dont forget to use SOL_TCP instead of SOL_SOCKET as the level so:

unsigned int timeout = 5000; //timeout in ms

if (setsockopt(yourSocket, SOL_TCP, TCP_USER_TIMEOUT, &timeout, sizeof(timeout))<0)
    fprintf(stderr,"setsockopt(SO_SNDTIMEO) failed");

If a message stays in the send buffer for longer than "timeout" ms an error will be thrown, in my case it seems to get thrown by a blocking recv()

owndampu
  • 55
  • 8
-3

It's really easy to do: reliable and not messy:

        Try
            Clients.Client.Send(BufferByte)
        Catch verror As Exception
            BufferString = verror.ToString
        End Try
        If BufferString <> "" Then
            EventLog.Text &= "User disconnected: " + vbNewLine
            Clients.Close()
        End If
user207421
  • 305,947
  • 44
  • 307
  • 483
  • It's not reliable. It doesn't distinguish between orderly and disorderly closes, and it doesn't even work until at least two sends have occurred, because of the socket send buffer. – user207421 Jul 19 '13 at 00:44
-3

I toyed with a few solutions but this one seems to work best for detecting host and/or client disconnection in Windows. It is for non-blocking sockets, and derived from IBM's example.

char buf;
int length=recv(socket, &buf, 0, 0);
int nError=WSAGetLastError();
if(nError!=WSAEWOULDBLOCK&&nError!=0){
    return 0;
}   
if (nError==0){
    if (length==0) return 0;
}
  • A recv() doesn't do anything on the wire, so it can't trigger any detection of cable pulls etc. Only a send() can do that. – user207421 Jan 07 '14 at 11:19
-4

The return value of receive will be -1 if connection is lost else it will be size of buffer.

void ReceiveStream(void *threadid)
{
    while(true)
    {
        while(ch==0)
        {
            char buffer[1024];
            int newData;
            newData = recv(thisSocket, buffer, sizeof(buffer), 0);
            if(newData>=0)
            {
                std::cout << buffer << std::endl;
            }
            else
            {
                std::cout << "Client disconnected" << std::endl;
                if (thisSocket)
                {
                    #ifdef WIN32
                        closesocket(thisSocket);
                        WSACleanup();
                    #endif
                    #ifdef LINUX
                        close(thisSocket);
                    #endif
                }
                break;
            }
        }
        ch = 1;
        StartSocket();
    }
}
Anand Paul
  • 224
  • 2
  • 10
  • 2
    -1 is only returned if an error occurs, not if there is a disconnect. I have verified on Windows and Linux that when a peer ungracefully disconnects, recv will simply return a buffer full of zeros. – TekuConcept Mar 07 '18 at 18:19
  • @TekuConcept Incorrect. It will return -1 with `errno == ECONNRESET`, and it won't do anything to the buffer at all. – user207421 Aug 11 '20 at 23:46
  • According to the man page, you are right! I guess I overlooked that line _"Additional errors may be generated and returned from the underlying protocol modules"_ – TekuConcept Aug 12 '20 at 01:44
-8

select (with the read mask set) will return with the handle signalled, but when you use ioctl* to check the number of bytes pending to be read, it will be zero. This is a sign that the socket has been disconnected.

This is a great discussion on the various methods of checking that the client has disconnected: Stephen Cleary, Detection of Half-Open (Dropped) Connections.

* for Windows use ioctlsocket.

Troyseph
  • 4,960
  • 3
  • 38
  • 61
sep
  • 3,409
  • 4
  • 29
  • 32
  • Thanks! This answer was quite helpful to me as well :) – Michael Mior Sep 21 '10 at 18:05
  • 86
    This is absolutely and positively **NOT** a 'sign that the socket has been disconnected'. It is a sign that there is no data present in the socket receive buffer. Period. It isn't the same thing by a country mile. The article you cite to support your answer doesn't even mention this technique. – user207421 Jul 15 '13 at 22:41
  • Socket will be signalled when it receives data, but if a checksum doesn't check out then no data will be in the read buffer after. – Mark K Cowan Aug 21 '14 at 09:13
  • 3
    @MarkKCowan Very hard to believe. The data shouldn't even get into the socket receive buffer until it has passed checksum validation. Do you have a source or a repeatable experiment for your claim? – user207421 Aug 21 '14 at 09:30
  • @MarkKCowan So it's a bug. They should try fixing it. – user207421 Aug 21 '14 at 09:57
  • Bug or not, it's documented behaviour so programs need to be able to handle it. – Mark K Cowan Aug 21 '14 at 09:58
  • 2
    @MarkKCowan It's only documented in the bug you cited. It's not documented in the specification of the IOCTL. There can be zero bytes to read at any time, most usually because the peer hasn't sent anything. This is not a correct technique. – user207421 Dec 03 '15 at 10:25
  • If you are polling the socket - with poll() for example - the POLLIN flag will be set when the client disconnects. If you then attempt to read from the socket as a result of this flag being set, the number of bytes read will be zero, signaling that the sending socket has disconnected. – Alexander Bolinsky Jul 20 '16 at 16:30
  • 2
    @EJP does not 0 byte read signify EOF (i.e. peer has closed the connection) ? If there is nothing on the socket and if you try to read it would give you an EWOULDBLOCK/EAGAIN error, not a 0 byte read. – ustulation Sep 13 '17 at 15:34
  • 1
    @Matthieu: Can you pls point me to one ? I don't think you can ever get a 0 byte read in TCP at application level (yes you might get it for ACKs etc., but that's not propagated to the user of the socket) which does not mean an EOF. – ustulation Dec 03 '18 at 14:52
  • Well, it would be a libc *bad* implementation. "Bad" because `read()` is supposed to block until a byte is read (in blocking mode). "Legal" 0 returns would be when 0 is given in the "count" argument. Otherwise, -1 and errno EWOULDBLOCK/EAGAIN indeed in non-blocking mode. I just remembered a special flag you could use to simulate non-blocking operations on blocking FD, where I could understand such behaviour (and can't remember which IOCTL it was). But that's corner cases... – Matthieu Dec 03 '18 at 15:06
  • @ustulation Yes, 0 bytes *read* signifies EOS, but this answer is talking about 0 bytes *available*, as reported by `ioctl()/FIONREAD`, which doesn't mean that at all. – user207421 Apr 21 '19 at 10:08
  • +1 This is the right answer, at least under Windows. Subjectively, this is the only way I found to detect that a HTTP/1.1 connection was gracefully closed on the server side after KeepAlive timeout. Objectively, this behavior is [documented by Microsoft in the documentation about `select`](https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select#remarks): *"If the virtual circuit was closed gracefully, and all data was received, then a recv will return immediately with zero bytes read."* – Arnaud Bouchez Dec 11 '19 at 08:25
  • @ArnaudBouchez No it isn't. See my comments. Zero bytes available does not imply end of stream. Your quotation is about `recv()`, not about `iotcl()`, which is what this answer is about. – user207421 Jun 18 '20 at 03:37
  • @sep Are you ever going to fix this? It's been here to mislead people for 12 years. – user207421 Aug 11 '20 at 23:43