0

I have created server.py and client.py with the intention of sending both text and binary files between the two. My code works for both small text and small binary files, however large binary files do not work.

In my testing, I use a 1.5 KB .ZIP file and I can send this without any problem. However, when I try sending a 44 MB .ZIP file, I am running into an issue.

My client code works as follows:

  1. The client creates a dictionary containing metadata about the file to be sent.
  2. The binary file is base64 encoded and is added as a value to the "filecontent" key of the dictionary.
  3. The dictionary is JSON serialised.
  4. The length of the serialised dictionary is calculated and fixed-length prefixed to the serialised dictionary.
  5. The client sends the entire message to the server.

On the server:

  1. The server receives the fixed-length header and interprets the size of the message in the transmission.
  2. The server reads the message in chunks of MAXSIZE (for testing set to 500), storing them temporarily.
  3. Once the entire message is received, the server joins the entire message.
  4. The server base64 decodes the value belonging to the "filecontent" key.
  5. Next, it writes the content of the file to disk.

As I said, this works fine for my 1.5 KB .ZIP file, but for the 44 MB .ZIP file it breaks in step 3 on the server. The error is thrown by the json.decoder. It complains about "Unterminated string starting at..."

While troubleshooting, I found that the last part of the message did not arrive. This explains the complaint from the json.decoder. I also found that the client sends 61841613 as the fixed length header, where it should be 62279500. A difference of 437887.

When I do not let the client calculate the size of the message, but simply hardcode the size as 62279500, then everything works as expected. That leads me to believe there is something wrong with the way the client calculates the message size for larger files. However I cannot work out what's wrong.

Here are the relevant parts of the code:

# client.py

connected = True
while connected:
    # Actual dictionary contains more metadata
    msg = { "filename" : "test.zip" , "author" : "marc" , "filecontent" : "" }

    myfile = open("test.zip", "rb")
    encoded = base64.b64encode(myfile.read())
    msg["filecontent"] = encoded.decode("ascii")

    msg = json.dumps(msg)
    header = "{:<10}".format(len(msg))
    header_msg = header + msg

    client.sendall(header_msg.encode("utf-8"))
# server.py

HEADER = 10
MAXSIZE = 500

connected = True
while connected:
    msg = conn.recv(HEADER).decode("utf-8")
    SIZE = int(msg)

    totalmsg = []
    while SIZE > 0:
        if SIZE > MAXSIZE:
            msg = conn.recv(MAXSIZE).decode("utf-8")
            totalmsg.append(msg)
            SIZE = SIZE - MAXSIZE
        else:
            msg = conn.recv(SIZE).decode("utf-8")
            totalmsg.append(msg)
            SIZE = 0

    msg = json.loads("".join(totalmsg))
    decoded = base64.b64decode(msg["filecontent"])

    myfile = open(msg["filename"], "wb")
    myfile.write(decoded)
    myfile.close()
Marc
  • 17
  • 7
  • Does `myfile.read()` read and return the entire file? Does `encoded` contain what you expect? And `msg["filecontent"]`, etc.. – rveerd Apr 14 '22 at 14:28
  • 3
    `conn.recv()` reads a _maximum_ of `MAXSIZE` bytes. It can read and return less bytes. You should check the number of bytes read (before calling `.decode()`) and decrement SIZE with the actual number of bytes returned. – rveerd Apr 14 '22 at 14:34
  • 2
    There's no good reason to encode the file as base64 and JSON. Why not just send the binary file contents? – Barmar Apr 14 '22 at 15:05
  • https://stackoverflow.com/a/43420503/238704 – President James K. Polk Apr 14 '22 at 21:15

1 Answers1

0

As mentioned in the comments conn.recv(MAXSIZE) receives at most MAXSIZE but can return less. The code assumes it always receives the amount requested. There is also no reason to base64-encode the file data; it just makes the file data much larger. Sockets are a byte stream, so just send the bytes.

The header can be delineated by a marker between it and the data. Below I've used CRLF and written the header as a single JSON line and also demonstrate sending a couple of files on the same connection:

client.py

import socket
import json

def transmit(sock, filename, author, content):
    msg = {'filename': filename, 'author': author, 'length': len(content)}
    data = json.dumps(msg, ensure_ascii=False).encode() + b'\r\n' + content
    sock.sendall(data)

client = socket.socket()
client.connect(('localhost',5000))
with client:
    with open('test.zip','rb') as f:
        content = f.read()
    transmit(client, 'test.zip', 'marc', content)
    content = b'The quick brown fox jumped over the lazy dog.'
    transmit(client, 'mini.txt', 'Mark', content)

server.py

import socket
import json
import os

os.makedirs('Downloads', exist_ok=True)

s = socket.socket()
s.bind(('',5000))
s.listen()

while True:
    c, a = s.accept()
    print('connected:', a)
    r = c.makefile('rb')   # wrap socket in a file-like object
    with c, r:
        while True:
            header_line = r.readline() # read in a full line of data
            if not header_line: break
            header = json.loads(header_line) # process the header
            print(header)
            remaining = header['length']
            with open(os.path.join('Downloads',header['filename']), 'wb') as f:
                while remaining :
                    # Unlike socket.recv() the makefile object won't return less
                    # than requested unless the socket is closed.
                    count = f.write(r.read(min(10240, remaining)))
                    if not count:  # socket closed?
                        if remaining:
                            print('Unsuccessful')
                        break
                    remaining -= count
                else:
                    print('Success')
    print('disconnected:', a)

Output:

connected: ('127.0.0.1', 14117)
{'filename': 'test.zip', 'author': 'marc', 'length': 52474063}
Success
{'filename': 'mini.txt', 'author': 'Mark', 'length': 45}
Success
disconnected: ('127.0.0.1', 14117)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Mark, thank you for taking time to write this code. I have taken your code and used it in my own program, however, I am getting a *connectionResetError: [Errno 104] Connection reset by peer* on the server. When I run your code exactly, it works fine. For troubleshooting, I am writing *remaining* to the console. The last number reported differs each time: 1407047, 1540167, 884807, etc. Not sure if this has to do with the fact that I am using SSL to secure the connection or the fact that the server handles each client in a thread. – Marc Apr 16 '22 at 12:28
  • After doing some research, it looks like it may be threading related. I will do some further investigation and possibly open a new question. Thank you to every one for their comments and effort. Especially @Mark, that really helped a lot. – Marc Apr 16 '22 at 15:19