27

I'm writing a TCP server that can take 15 seconds or more to begin generating the body of a response to certain requests. Some clients like to close the connection at their end if the response takes more than a few seconds to complete.

Since generating the response is very CPU-intensive, I'd prefer to halt the task the instant the client closes the connection. At present, I don't find this out until I send the first payload and receive various hang-up errors.

How can I detect that the peer has closed the connection without sending or receiving any data? That means for recv that all data remains in the kernel, or for send that no data is actually transmitted.

Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
  • can you `setsockopt(...SOL_SOCKET, SO_KEEPALIVE...)` from python? – ninjalj Dec 02 '11 at 00:55
  • 1
    what kind of server is this: http, vanilla tcp sockets, or some other tcp based protocol? – Foon Dec 02 '11 at 21:38
  • @Foon: It's vanilla TCP. – Matt Joiner Dec 03 '11 at 07:28
  • If you use twisted, it encapsulates a lot of this logic and you just have to supply a callback for the disconnect event: http://twistedmatrix.com/documents/current/core/howto/servers.html – Bashwork Dec 08 '11 at 20:09
  • 3
    @Bashwork: I can't stand Twisted, it completely inverts control flow and makes your code verbose and unreadable. – Matt Joiner Dec 08 '11 at 22:44
  • 1
    Not to flame, but that sounds very hyperbolic. Nothing on that page was verbose, unreadable, or "inverted". – Bashwork Dec 09 '11 at 15:27
  • @Bashwork twisted will not detect 'dead client connection'. as mentioned in this question: http://stackoverflow.com/questions/4218169/twisted-not-detecting-client-disconnects – ayyayyekokojambo Jan 21 '13 at 13:40

7 Answers7

28

The select module contains what you'll need. If you only need Linux support and have a sufficiently recent kernel, select.epoll() should give you the information you need. Most Unix systems will support select.poll().

If you need cross-platform support, the standard way is to use select.select() to check if the socket is marked as having data available to read. If it is, but recv() returns zero bytes, the other end has hung up.

I've always found Beej's Guide to Network Programming good (note it is written for C, but is generally applicable to standard socket operations), while the Socket Programming How-To has a decent Python overview.

Edit: The following is an example of how a simple server could be written to queue incoming commands but quit processing as soon as it finds the connection has been closed at the remote end.

import select
import socket
import time

# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), 7557))
serversocket.listen(1)

# Wait for an incoming connection.
clientsocket, address = serversocket.accept()
print 'Connection from', address[0]

# Control variables.
queue = []
cancelled = False

while True:
    # If nothing queued, wait for incoming request.
    if not queue:
        queue.append(clientsocket.recv(1024))

    # Receive data of length zero ==> connection closed.
    if len(queue[0]) == 0:
        break

    # Get the next request and remove the trailing newline.
    request = queue.pop(0)[:-1]
    print 'Starting request', request

    # Main processing loop.
    for i in xrange(15):
        # Do some of the processing.
        time.sleep(1.0)

        # See if the socket is marked as having data ready.
        r, w, e = select.select((clientsocket,), (), (), 0)
        if r:
            data = clientsocket.recv(1024)

            # Length of zero ==> connection closed.
            if len(data) == 0:
                cancelled = True
                break

            # Add this request to the queue.
            queue.append(data)
            print 'Queueing request', data[:-1]

    # Request was cancelled.
    if cancelled:
        print 'Request cancelled.'
        break

    # Done with this request.
    print 'Request finished.'

# If we got here, the connection was closed.
print 'Connection closed.'
serversocket.close()

To use it, run the script and in another terminal telnet to localhost, port 7557. The output from an example run I did, queueing three requests but closing the connection during the processing of the third one:

Connection from 127.0.0.1
Starting request 1
Queueing request 2
Queueing request 3
Request finished.
Starting request 2
Request finished.
Starting request 3
Request cancelled.
Connection closed.

epoll alternative

Another edit: I've worked up another example using select.epoll to monitor events. I don't think it offers much over the original example as I cannot see a way to receive an event when the remote end hangs up. You still have to monitor the data received event and check for zero length messages (again, I'd love to be proved wrong on this statement).

import select
import socket
import time

port = 7557

# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), port))
serversocket.listen(1)
serverfd = serversocket.fileno()
print "Listening on", socket.gethostname(), "port", port

# Make the socket non-blocking.
serversocket.setblocking(0)

# Initialise the list of clients.
clients = {}

# Create an epoll object and register our interest in read events on the server
# socket.
ep = select.epoll()
ep.register(serverfd, select.EPOLLIN)

while True:
    # Check for events.
    events = ep.poll(0)
    for fd, event in events:
        # New connection to server.
        if fd == serverfd and event & select.EPOLLIN:
            # Accept the connection.
            connection, address = serversocket.accept()
            connection.setblocking(0)

            # We want input notifications.
            ep.register(connection.fileno(), select.EPOLLIN)

            # Store some information about this client.
            clients[connection.fileno()] = {
                'delay': 0.0,
                'input': "",
                'response': "",
                'connection': connection,
                'address': address,
            }

            # Done.
            print "Accepted connection from", address

        # A socket was closed on our end.
        elif event & select.EPOLLHUP:
            print "Closed connection to", clients[fd]['address']
            ep.unregister(fd)
            del clients[fd]

        # Error on a connection.
        elif event & select.EPOLLERR:
            print "Error on connection to", clients[fd]['address']
            ep.modify(fd, 0)
            clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

        # Incoming data.
        elif event & select.EPOLLIN:
            print "Incoming data from", clients[fd]['address']
            data = clients[fd]['connection'].recv(1024)

            # Zero length = remote closure.
            if not data:
                print "Remote close on ", clients[fd]['address']
                ep.modify(fd, 0)
                clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

            # Store the input.
            else:
                print data
                clients[fd]['input'] += data

        # Run when the client is ready to accept some output. The processing
        # loop registers for this event when the response is complete.
        elif event & select.EPOLLOUT:
            print "Sending output to", clients[fd]['address']

            # Write as much as we can.
            written = clients[fd]['connection'].send(clients[fd]['response'])

            # Delete what we have already written from the complete response.
            clients[fd]['response'] = clients[fd]['response'][written:]

            # When all the the response is written, shut the connection.
            if not clients[fd]['response']:
                ep.modify(fd, 0)
                clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

    # Processing loop.
    for client in clients.keys():
        clients[client]['delay'] += 0.1

        # When the 'processing' has finished.
        if clients[client]['delay'] >= 15.0:
            # Reverse the input to form the response.
            clients[client]['response'] = clients[client]['input'][::-1]

            # Register for the ready-to-send event. The network loop uses this
            # as the signal to send the response.
            ep.modify(client, select.EPOLLOUT)

        # Processing delay.
        time.sleep(0.1)

Note: This only detects proper shutdowns. If the remote end just stops listening without sending the proper messages, you won't know until you try to write and get an error. Checking for that is left as an exercise for the reader. Also, you probably want to perform some error checking on the overall loop so the server itself is shutdown gracefully if something breaks inside it.

Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
Blair
  • 15,356
  • 7
  • 46
  • 56
  • Unfortunately, as far as I am aware that is the only reliable, cross-platform way to check for a closed connection (its essentially how sockets were designed to work). I would like someone to prove me wrong however - I've run into this sort of problem in the past. The general solution I've used is to buffer any actual data obtained from the ``recv()`` call for processing in the future. I've updated my answer with an example of this. If your situation means this cannot be used, I'd suggest updating your question with more details to see if anyone can offer a more suitable solution. – Blair Apr 17 '11 at 03:41
  • Was mentioning epoll accidental? Cross platform isn't really a requirement here, I'm more than happy to go Linux only if it makes things more reliable. – Matt Joiner Apr 21 '11 at 13:01
  • 1
    +1, `select` is the right way to go. buffer any data you get when calling `recv` and bail out on zero (or error, which you may get if the connection wasn't shut down gracefully) – Hasturkun Dec 08 '11 at 17:32
  • @MattJoiner, I've added an example using epoll. Unfortunately it doesn't seem to offer any real advantage - as far as I can see you still have to check for zero-length incoming messages to detect if the client closes the connection. – Blair Dec 08 '11 at 21:41
  • This answer doesn't come close. You're reading data. It's also too long, sorry. – Matt Joiner Dec 08 '11 at 23:21
  • 1
    @MattJoiner: If you really aren't expecting data during this time, then you only need to deal with `recv` returning `0` or `-1` and don't need to buffer. to ensure you are woken up even for the case where the remote end simply vanishes (as apposed to a graceful or non-graceful shutdown) use keepalive. – Hasturkun Dec 09 '11 at 15:57
18

I've had a recurring problem communicating with equipment that had separate TCP links for send and receive. The basic problem is that the TCP stack doesn't generally tell you a socket is closed when you're just trying to read - you have to try and write to get told the other end of the link was dropped. Partly, that is just how TCP was designed (reading is passive).

I'm guessing Blair's answer works in the cases where the socket has been shut down nicely at the other end (i.e. they have sent the proper disconnection messages), but not in the case where the other end has impolitely just stopped listening.

Is there a fairly fixed-format header at the start of your message, that you can begin by sending, before the whole response is ready? e.g. an XML doctype? Also are you able to get away with sending some extra spaces at some points in the message - just some null data that you can output to be sure the socket is still open?

asc99c
  • 3,815
  • 3
  • 31
  • 54
12

The socket KEEPALIVE option allows to detect this kind of "drop the connection without telling the other end" scenarios.

You should set the SO_KEEPALIVE option at SOL_SOCKET level. In Linux, you can modify the timeouts per socket using TCP_KEEPIDLE (seconds before sending keepalive probes), TCP_KEEPCNT (failed keepalive probes before declaring the other end dead) and TCP_KEEPINTVL (interval in seconds between keepalive probes).

In Python:

import socket
...
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 5)

netstat -tanop will show that the socket is in keepalive mode:

tcp        0      0 127.0.0.1:6666          127.0.0.1:43746         ESTABLISHED 15242/python2.6     keepalive (0.76/0/0)

while tcpdump will show the keepalive probes:

01:07:08.143052 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683438 848683188>
01:07:08.143084 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683438 848682438>
01:07:09.143050 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683688 848683438>
01:07:09.143083 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683688 848682438>
ninjalj
  • 42,493
  • 9
  • 106
  • 148
3

After struggling with a similar problem I found a solution that works for me, but it does require calling recv() in non-blocking mode and trying to read data, like this:

bytecount=recv(connectionfd,buffer,1000,MSG_NOSIGNAL|MSG_DONTWAIT);

The nosignal tells it to not terminate program on error, and the dontwait tells it to not block. In this mode, recv() returns one of 3 possible types of responses:

  • -1 if there is no data to read or other errors.
  • 0 if the other end has hung up nicely
  • 1 or more if there was some data waiting.

So by checking the return value, if it is 0 then that means the other end hung up. If it is -1 then you have to check the value of errno. If errno is equal to EAGAIN or EWOULDBLOCK then the connection is still believed to be alive by the server's tcp stack.

This solution would require you to put the call to recv() into your intensive data processing loop -- or somewhere in your code where it would get called 10 times a second or whatever you like, thus giving your program knowledge of a peer who hangs up.

This of course will do no good for a peer who goes away without doing the correct connection shutdown sequence, but any properly implemented tcp client will correctly terminate the connection.

Note also that if the client sends a bunch of data then hangs up, recv() will probably have to read that data all out of the buffer before it'll get the empty read.

Jesse Gordon
  • 1,455
  • 15
  • 16
0

This code is very simple, reconnects forever and captures crtl+c to finish program closing the port. Change the port to you your needs

import select
import socket
import time
import sys
import threading

#create socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('', 2105)
print('starting up on {} port {}'.format(*server_address))
sock.bind(server_address)
sock.listen(1)

#main loop
while True:
    #waits for a new connection
    print('waiting for a connection')
    connection, client_address = sock.accept()
    try:
        print('connection from', client_address)
        #connection loop
        while True:
            try:
                r, w, e = select.select((connection,), (), (), 0)
                if r:
                    data = connection.recv(16)
                    if len(data) == 0:
                        break
                    print data
                    #example, return to client received data
                    connection.sendall(data)

            except KeyboardInterrupt:
                connection.close()
                sys.exit()

            except Exception as e:
                pass

            #let the socket receive some data
            time.sleep(0.1)

    except Exception as e:
        print e

    finally:
        #clean up connection
        connection.close()
-1

Check out select module.

Rumple Stiltskin
  • 9,597
  • 1
  • 20
  • 25
-1

You can select with a timeout of zero, and read with the MSG_PEEK flag.

I think you really should explain what you precisely mean by "not reading", and why the other answer are not satisfying.

shodanex
  • 14,975
  • 11
  • 57
  • 91
  • 4
    This won't work since `recv` will return non-zero if there is data waiting, even if the peer has already closed the connection. – Matt Joiner Dec 08 '11 at 23:20