1

I'm trying to track down a mystery exception from socket.recv() that has no message to provide a clue as to where it's coming from, or why. This is on Windows 7, Python 2.7.

I create a blocking socket like this:

self.client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.client.settimeout(5)

I connect to a piece of equipment which streams data out. I never send anything through this socket, I only receive. So I spin in a loop calling select:

while(True):
    # Process commands from the GUI. These can include requests to
    # connect, disconnect, or quit
    quitting = self.processCmds()
    if quitting == True:
        print 'exiting'
        return
    try:
        rlist, wlist, errlist = select.select([self.client], [], [self.client], 5)
        if self.client in errlist:
            self.reconnect()
            continue
        elif self.client in rlist:
            # We have a message.
            msg = recvall(self.client, MESSAGE_LENGTH)
                if msg == None:
                    self.reconnect()
                    continue

So far, so good. rcvall() spins in a loop the way you'd expect, trying to read the requested number of bytes:

def recvall(conn, count):
    buf = b''
    numIterations = 0
    while count:
        try:
            recvString = conn.recv(count)
        except:
            # OH NO SOMETHING BAD HAPPENED!
            e = sys.exc_info()[0]
            return None
        if not recvString:
            # the socket has been closed by the remote end
            return None
        if (len(recvString) > 0):
            buf = buf + recvString
            count = count - len(recvString)
        else:
            # If the other guy has sent no bytes, don't wait forever
            numIterations += 1
            if numIterations > 100:
                return None
    return buf

So if anything goes wrong, recvall() returns None, which tells the caller to close the existing connection and open a new one.

This works nicely for about 8 hours or so, but then I see a stream of reconnection attempts in my log, none successful. After some hunting and pecking with breakpoints I learn that we're hitting the exception in rcvall() where the comment OH NO SOMETHING BAD HAPPENED! is. But when I attempt to examine the exception object in the debugger, its message member is None. In fact, all its members are None, so there's no clue about who's raising it or why.

There are a couple of weird things about this. The exception is happening AFTER select() has already told me that the socket is error-free and contains data. Can something go wrong between the call to select() and the call to recvall()? It's possible, but hardly seems likely.

second, has anyone ever seen such a weird exception coming from socket.recv? Any clue where to look for it?

  • The exception may have useful information, it's not clear you print it out. That'd be the first thing I do. Second, if the client disconnects, you can only detect it when you try to read/write. It's a weirdness of the UNIX TCP/IP socket API, but select won't tell you that. See http://stackoverflow.com/questions/283375/detecting-tcp-client-disconnect – rts1 Jan 29 '16 at 15:34
  • Good suggestion. I think my next bright idea is to remove the catch for that exception and run it without the IDE. Maybe the command-line traceback will be more enlightening than what the IDE is showing me. Sometimes Visual Studio + PTVS is funny about giving you the whole story when an exception occurs. I do suspect something's going wrong on the other end, and maybe having more info. about that exception will tell me what it is. – user2623722 Jan 29 '16 at 16:09

1 Answers1

0

It looks like the remote end is forcibly closing the connection, but Visual Studio + PVTS is being funny about showing all the information about the exception. Also, apparently, in Windows you are notified of the other end closing the connection via an exception, instead of recv() returning None like the docs say.

I disabled the exception handler and ran without the IDE to let Python print a traceback. Sometimes Visual Studio is funny about telling you the whole story with exceptions. This is what I got:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "../shared\retrieveMessage.py", line 184, in run
    msgLength = recvall(self.client, 2)
  File "../shared\retrieveMessage.py", line 52, in recvall
    recvString = conn.recv(count)
error: [Errno 10054] An existing connection was forcibly closed by the remote host

So it looks like the other side is misbehaving after all.