10

I'm creating a server/client application in python using the socket module and for whatever reason my server keeps ending the connection. The weird part is this works in Windows perfectly but not Linux. I've looked all over the place for a possible solution but none of them worked. Below is a sanitized version of the code which exploits the bug, however, with a higher success rate. Normally it never works. Hopefully this is still enough information. Thanks!

Server:

import logging
import socket
import threading
import time

def getData():
    HOST = "localhost"
    PORT = 5454

    while True:
        s = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
        s.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 ) #because linux doesn't like reusing addresses by default

        s.bind( ( HOST, PORT ) )
        logging.debug( "Server listens" )
        s.listen( 5 )
        conn, addr = s.accept()
        logging.debug( "Client connects" )
        print "Connected by,", addr

        dataRequest = conn.recv( 1024 )
        logging.debug( "Server received message" )

        time.sleep( .01 ) #usually won't have to sample this fast

        data = """Here is some data that is approximately the length 
of the data that I am sending in my real server. It is a string that
doesn't contain any unordinary characters except for maybe a tab."""

        if not timeThread.isAlive(): #lets client know test is over
            data = "\t".join( [ data, "Terminate" ] )
            conn.send( data )
            s.close()
            print "Finished"
            print "Press Ctrl-C to quit"
            break
        else:
            logging.debug( "Server sends data back to client" )
            conn.send( data )

        logging.debug( "Server closes socket" )
        s.close()

def timer( t ):
    start = time.time()
    while ( time.time() - start ) < t:
        time.sleep( .4 )
        #sets flag for another thread not here

def main():
    global timeThread

    logging.basicConfig( filename="test.log", level=logging.DEBUG )

    #time script runs for
    t = 10 #usually much longer (hours)

    timeThread = threading.Thread( target=timer, args=( t, ) )
    dataThread = threading.Thread( target=getData, args=() )
    timeThread.start()
    dataThread.start()

    #just for testing so I can quit threads when sockets break
    while True:
        time.sleep( .1 )

    timeThread.join()
    dataThread.join()

if __name__ == "__main__":
    main()

Client:

import logging
import socket

def getData():
    dataList = []
    termStr = "Terminate"

    data = sendDataRequest()
    while termStr not in data:
        dataList.append( data )
        data = sendDataRequest()
    dataList.append( data[ :-len( termStr )-1 ] )

def sendDataRequest():
    HOST = "localhost"
    PORT = 5454

    s = socket.socket( socket.AF_INET, socket.SOCK_STREAM )

    while True:
        try:
            s.connect( ( HOST, PORT ) )
            break
        except socket.error:
            print "Connecting to server..."

    logging.debug( "Client sending message" )
    s.send( "Hey buddy, I need some data" ) #approximate length

    try:
        logging.debug( "Client starts reading from socket" )
        data = s.recv( 1024 )
        logging.debug( "Client done reading" )
    except socket.error, e:
        logging.debug( "Client throws error: %s", e )

    print data

    logging.debug( "Client closes socket" )
    s.close()

    return data

def main():
    logging.basicConfig( filename="test.log", level=logging.DEBUG )
    getData()

if __name__ == "__main__":
    main()

Edit: Adding traceback

Traceback (most recent call last):
    File "client.py", line 39, in <moduel>
        main()
    File "client.py", line 36, in main
        getData()
    File "client.py", line 10, in getData
        data = sendDataRequest()
    File "client.py", line 28, in sendDataRequest
        data = s.recv( 1024 )
socket.error: [Errno 104] Connection reset by peer

Edit: Added debugging

DEBUG:root:Server listens
DEBUG:root:Client sending message
DEBUG:root:Client connects
DEBUG:root:Client starts reading from socket
DEBUG:root:Server received message
DEBUG:root:Server sends data back to client
DEBUG:root:Server closes socket
DEBUG:root:Client done reading
DEBUG:root:Server listens
DEBUG:root:Client sending message
DEBUG:root:Client connects
DEBUG:root:Client starts reading from socket
DEBUG:root:Server received message
DEBUG:root:Server sends data back to client
DEBUG:root:Client done reading
DEBUG:root:Client sending message
DEBUG:root:Client starts reading from socket
DEBUG:root:Server closes socket
DEBUG:root:Client throws error: [Errno 104] Connection reset by peer
DEBUG:root:Server listens

Tom's theory appears to be correct. I'll try to figure out how to close the connection better.

This isn't solved but the accepted answer seems to point out the problem.

Edit: I tried using Tom's getData() function and it looks like the server still closes the connection too soon. Should be repeatable since I couldn't get it to work in Windows either.

Server Output/Traceback:

Connected by, ('127.0.0.1', 51953)
Exception in thread Thread-2:
Traceback (most recent call last):
    File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
        self.run()
    File "/usr/lib64/python2.6/threading.py", line 484, in run
        self.__target(*self.__args, **self.__kwargs)
    File "server.py", line 15, in getData
        s.bind( ( HOST, PORT ) )
    File "<string>", line 1, in bind
error: [Errno 22] Invalid argument

Client Output/Traceback:

Here is some data that is approximately the length
of the data that I am sending in my real server. It is a string that
doesn't contain any unordinary characters except for maybe a tab.
Traceback (most recent call last):
    File "client.py", line 49, in <moduel>
        main()
    File "client.py", line 46, in main
        getData()
    File "client.py", line 11, in getData
        data = sendDataRequest()
    File "client.py", line 37, in sendDataRequest
        print data
UnboundLocalError: local variable 'data' referenced before assignment

Log:

DEBUG:root:Server listens
DEBUG:root:Client sending message
DEBUG:root:Client connects
DEBUG:root:Client starts reading from socket
DEBUG:root:Server received message
DEBUG:root:Server sends data back to client
DEBUG:root:Server closes connection
DEBUG:root:Client done reading
DEBUG:root:Client closes socket
DEBUG:root:Client sending message
DEBUG:root:Client starts reading from socket
DEBUG:root:Client throws error: [Errno 104] Connection reset by peer

Update: I used Tom's getData() function but moved the s.bind() to before the loop and got it to work. I honestly don't know why this works so it would be cool if somebody could explain why the server closing it's client socket is safe but not when it closes it's server socket. Thanks!

smbullet
  • 313
  • 1
  • 3
  • 15
  • Are you 100000% positive this isn't an actual connectivity issue? What do tcpdump/wireshark say? Can you test connectivity with `nc`? – admdrew Jul 21 '14 at 21:17
  • Can you give more speciifc information as to how far both the server and client get before you encounter the error - and what the exact error/stack trace is for both client and server? – Tom Dalton Jul 21 '14 at 21:19
  • @admdrew I don't see how it could be a connectivity issue with localhost but I can get that in a few minutes – smbullet Jul 21 '14 at 21:21
  • @TomDalton Sure, I'll post that in a few minutes. As to how far, usually 5 seconds. Sometimes it completes (10s) but never multiple times in a row. – smbullet Jul 21 '14 at 21:23
  • You server's `while True` loop appears to keep opening a new socket without any signal from the client that the socket is finished with or that the client has finished reading data - is that intentional? – Tom Dalton Jul 21 '14 at 21:26
  • Also looks like the indentation for getData() is wrong - I'm assuming the while True loop is within that function. – Tom Dalton Jul 21 '14 at 21:29
  • That was intentional. I assumed since the server was sending less than the client was reading at once I wouldn't need an "if not data: break". Let me fix that indentation though. – smbullet Jul 21 '14 at 21:40
  • You can't/shouldn't make any assumptions as to how the data is fragmented while sending. Just because you send X bytes at once doesn't mean the other end will receive all X at the same time. You do need to implement some sort of protocol. – Tom Dalton Jul 21 '14 at 21:42
  • @TomDalton How would you recommend implementing that? Just have the client wait until it has all of the data and then only have the client close the socket to prevent the server from closing it too early? – smbullet Jul 21 '14 at 21:51

3 Answers3

9

While I can't reproduce this issue (on Windows 7 64-bit, Python 2.7), my best guess is that the following is happening:

  • Server listens
  • Client connects
  • Client sends "Hey buddy, I need some data"
  • Server receives this
  • Server sends data back to client
  • Server closes socket
  • Client attempts to read from socket, finds that it's been closed
  • Client throws the "connection reset by peer" error.

The stacktrace you added from the client seems to support this theory. Is it possible to prove that isn't the case with some additional logging or similar?

Other things of note: If your client doesn't find the terminate string in the first data it receives, it opens a new socket to the server. That looks wrong to me - you should read data from the same socket until you have it all.

Edit: Couple more things:

In your example log output, you haven't updated the code so I can't see where each log line comes from. However, it looks suspiciously like you have 2 clients running in parallel (in different processes or threads maybe?), which leads to:

I just noticed one final thing. In the example here https://docs.python.org/2/library/socket.html#example the server doesn't close the socket, it closes the connection generated from listening on the socket. It may be that you have 2 clients connected to the same server socket instance, when you close the server socket you are actually disconnecting both connected clients, not just the first. If you are running multiple clients then logging some sort of identity Eg. DEBUG:root:Client(6) done reading might help prove that.

Could you try the following for the server's data thread main loop, will show if the problem is related to closing the listen socket rather than the connected socket:


def getData():
    HOST = "localhost"
    PORT = 5454

    s = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
    # s.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 ) #because linux doesn't like reusing addresses by default
    s.bind( ( HOST, PORT ) )
    logging.debug( "Server listens" )
    s.listen( 5 )

    while True:

        conn, addr = s.accept()
        logging.debug( "Client connects" )
        print "Connected by,", addr

        dataRequest = conn.recv( 1024 )
        logging.debug( "Server received message" )

        time.sleep( .01 ) #usually won't have to sample this fast

        data = """Here is some data that is approximately the length 
of the data that I am sending in my real server. It is a string that
doesn't contain any unordinary characters except for maybe a tab."""

        if not timeThread.isAlive(): #lets client know test is over
            data = "\t".join( [ data, "Terminate" ] )
            conn.send( data )
            conn.close()
            print "Finished"
            print "Press Ctrl-C to quit"
            break
        else:
            logging.debug( "Server sends data back to client" )
            conn.send( data )

        logging.debug( "Server closes connection" )
        conn.close()
Tom Dalton
  • 6,122
  • 24
  • 35
  • I can try, what would you like? I can't do tcpdump or wireshark but I can try an nc command. – smbullet Jul 21 '14 at 21:48
  • Have you looked at using the `logging` package in python? It's thread-safe out of the box. – Tom Dalton Jul 21 '14 at 21:50
  • I have not but I can give it a look. – smbullet Jul 21 '14 at 21:50
  • What are you looking for? Do you just want me to log whenever a connection is established and data is sent/received with a timestamp? – smbullet Jul 21 '14 at 22:28
  • Okay, I added the end of the log in the OP. You appear to be correct. – smbullet Jul 22 '14 at 14:46
  • I can assure you that I only have one client running. I'll update the code with the logging. – smbullet Jul 23 '14 at 13:35
  • Thanks for continuing to help! It looks like the same thing is happening. I have the tracebacks/log edited in the OP. – smbullet Jul 24 '14 at 18:00
  • The latest error stack traces from client and server look like a different problem? – Tom Dalton Jul 25 '14 at 08:55
  • The error trace from the client is the same problem, I just have the receive in a try statement so when it failed (look in log) it just skipped to printing the data which was not defined. But I agree that the server trace is new. – smbullet Jul 25 '14 at 13:09
  • I moved `s.bind` to before the loop and got it to work! Can you explain why this works though? – smbullet Jul 25 '14 at 21:37
5

I'm out of my depth here, but looking into a possibly related problem (intermittent "connection reset by peer" errors on Linux, works fine on Windows), and I came across http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer/. Our helpful debugger there, Garry Dolley, summarizes (in 2008!):

"Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn't handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a 'Connection reset by peer' message."

He gives a solution involving /etc/sysctl.conf. I haven't tried this yet but may be worth a look?

Jacob Eliosoff
  • 119
  • 1
  • 2
0

I had a similar issue where I was getting connection reset by peer at the sending side. Turned out that it happened because an exception was thrown somewhere on the receiver side. Thus, when the script ends unexpectedly, the OS will just RST the connection on that socket. This is quite an old thread, but for anyone experiencing a similar issue with threads, my advice would be: make sure that it works single-threaded before trying to make it complicated.

Misho Janev
  • 512
  • 5
  • 13