0

I am running a very simple python (3.x) client-server program (both locally on my PC) for a school project (not intended for the real world) which just sends messages back-and-forth (like view customers, add customer, delete customer, etc... real basic).

Sometimes the data can be multiple records which I had stored as namedTuples (just made sense) and then went down the path of using Pickle to transfer then.

So for example on the client I do something like this:

s.send(message.encode('utf-8'))
pickledResponse = s.recv(4096);
response = pickle.loads(pickledResponse)

Now ever so often I get the following error:

response = pickle.loads(pickledResponse)
EOFError: Ran out of input

My fear is that this has something to do with my socket (TCP) transfer and maybe somehow I am not getting all the data in time for my pickle.loads - make sense? If not I am really lost as to why this would be happening so inconsistently.

However, even if I am right I am not sure how to fix it (quickly), I was considering dropping pickle and just using strings (but couldn't this suffer from the same fate)? Does anyone have any suggestions?

Really my message are pretty basic - usually just a command and some small data like "1=John" which means command (1) which is FIND command and then "John" and it returns the record (name, age, etc...) of John (as a namedTuple - but honestly this isn't mandatory).

Any suggestions or help would be much appreciated, looking for a quick fix...

JSchwartz
  • 2,536
  • 6
  • 32
  • 45
  • You should not use pickle if the source is not trusted, and, on a network, do not trust anyone, quoting the doc: "Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source." – Julien Palard Jul 13 '14 at 20:16
  • You may consider the possibility to not work with socket yourself (using socket yourself mean to take care about message length, what you're not doing right here). I'll advocate the use of HTTP, JSON messages, and as we're here, all of this RESTful. – Julien Palard Jul 13 '14 at 20:22
  • @JulienPalard given that this is a school project the idea of "trusted" is not important, and the project requirements are to use sockets (TCP) for communication - sadly I cannot change that – JSchwartz Jul 13 '14 at 20:29
  • @JulienPalard knowing that it is all LOCAL (on the same PC) there is no way around the issue of dealing with length? What if I used STRINGS instead of Pickled data? Any other alternatives to avoid this issue? – JSchwartz Jul 13 '14 at 20:31
  • You may debug easily using pickle version 1 (human readable), or json, you'll see what went wrong. Closing the connection is OK to not have to care about the length. – Julien Palard Jul 13 '14 at 20:54
  • @JSchwartz There is no way to avoid that issue. A `recv()` call can return almost any ”slice” from the _stream_ of data, even just one byte per call. If the communication is full duplex it might even return data from more than one message sent on the other side. So you have to either send just one message per connection — end of message is detected by the sender closing the connection, or you have to think of a protocol that lets you clearly recognise a complete message. – BlackJack Jul 13 '14 at 20:56
  • @BlackJack one message per connection sounds perfectly fine to me - but how will that solve the problem? Do I need to do something special at the receiving end to tell the .recv to wait for the connection to close at the other end? – JSchwartz Jul 13 '14 at 21:02
  • @JSchwartz You just have to call `recv()` until it returns an empty bytes object, which happens when the connection is closed on the other end. – BlackJack Jul 13 '14 at 21:15
  • @BlackJack ok I think I get what you mean but ... implementing it ... so what you loop on .recv() and keep adding to some buffer or something until it returns null (how do you check that in Python?) and then try to unpickle that? – JSchwartz Jul 13 '14 at 21:22
  • @BlackJack odd, my prof just said that the reason I am having issues is because I am using pickle, that if I just sent a string (converted into bytes) I wouldn't have the issue ... that make any sense? – JSchwartz Jul 13 '14 at 21:23
  • @JSchwartz No that doesn't make sense. The TCP connection doesn't care what the bytes represent. – BlackJack Jul 13 '14 at 21:25

2 Answers2

5

The problem with your code is that recv(4096), when used on a TCP socket, might return different amount of data from what you might have expected, as they are sliced at packet boundaries.

The easy solution is to prefix each message with length; for sending like

import struct
packet = pickle.dumps(foo)
length = struct.pack('!I', len(packet)
packet = length + packet

then for receiving

import struct

buf = b''
while len(buf) < 4:
    buf += socket.recv(4 - len(buf))

length = struct.unpack('!I', buf)[0]
# now recv until at least length bytes are received,
# then slice length first bytes and decode.

However, Python standard library already has a support for message oriented pickling socket, namely multiprocessing.Connection, that supports sending and receiving pickles with ease using the Connection.send and Connection.recv respectively.

Thus you can code your server as

from multiprocessing.connection import Listener

PORT = 1234
server_sock = Listener(('localhost', PORT))
conn = server_sock.accept()

unpickled_data = conn.recv()

and client as

from multiprocessing.connection import Client

client = Client(('localhost', 1234))
client.send(['hello', 'world'])
0

For receiving everything the server sends until it closes its side of the connection try this:

import json
import socket
from functools import partial


def main():
    message = 'Test'

    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.connect(('127.0.0.1', 9999))

        sock.sendall(message.encode('utf-8'))
        sock.shutdown(socket.SHUT_WR)

        json_response = b''.join(iter(partial(sock.recv, 4096), b''))

    response = json.loads(json_response.decode('utf-8'))
    print(response)


if __name__ == '__main__':
    main()

I've used sendall() because send() has the same ”problem” as recv(): It's not guaranteed everything is sent. send() returns the number of bytes actually sent, and the programmer has to make sure that matches the length of the argument and if not to send the rest until everything is out. After sending the writing side of the connection is closed (shutdown()) so the server knows there is no more data coming from the client. After that, all data from the server is received until the server closes its side of the connection, resulting in the empty bytes object returned from the recv() call.

Here is a suitable socketserver.TCPServer for the client:

import json
from socketserver import StreamRequestHandler, TCPServer


class Handler(StreamRequestHandler):

    def handle(self):
        print('Handle request...')
        message = self.rfile.read().decode('utf-8')
        print('Received message:', message)
        self.wfile.write(
            json.dumps(
                {'name': 'John', 'age': 42, 'message': message}
            ).encode('utf-8')
        )
        print('Finished request.')



def main():
    address = ('127.0.0.1', 9999)
    try:
        print('Start server at', address, '...')
        server = TCPServer(address, Handler)
        server.serve_forever()
    except KeyboardInterrupt:
        print('Stopping server...')


if __name__ == '__main__':
    main()

It reads the complete data from the client and puts it into a JSON encoded response with some other, fixed items. Instead of the low level socket operations it makes use of the more convenient file like objects the TCPServer offers for reading and writing from/to the connection. The connection is closed by the TCPServer after the handle() method finished.

BlackJack
  • 4,476
  • 1
  • 20
  • 25
  • To be clear, the SERVER will close the connection established by the client after it has completed .sendall(...)? I am using a SocketServer and all my code is handled within the Handler... how does the server terminate the connection made by the client? and won't the client get an error that their connection was severed? – JSchwartz Jul 13 '14 at 22:37
  • 1
    @JSchwartz The `StreamRequestHandler` closes the connection after the request was served. The client doesn't get an error for reading from a closed connection. As I said, the `recv()` returns an empty bytes object then. Maybe the [Socket Programming HOWTO](https://docs.python.org/2/howto/sockets.html) from the Python documentation is helpful here. – BlackJack Jul 13 '14 at 23:41
  • @JSchwartz I've expanded the example to runnable code and also added code for the server side. – BlackJack Jul 14 '14 at 08:54