Error when struct unpacking variable-length string over a socket

Question

I have a server program and client program, and I'm trying to send messages between the two based on a specified protocol and fixed buffer size of 16 bytes. The procedure works like this:

Client has two individual string expressions defined.
Client packs the two expressions into a single struct based on the protocol (using UTF-8 encoding).
Client sends packed expression to server.
Server received packed expression.
Server unpacks packed expression.
Server sends back (echoes) the first expression.
Client receives the first expression and prints it (decoded by UTF-8).

Essentially it's a simple echo procedure, though I'm only sending back the first expression for now (I'll change this later).

Here's a picture drawn a picture of the protocol I need to use. I pack my struct according to this. That is, the first two values are two-byte ints, followed by a variable-length expression/string, followed by another int, then the second variable-length expression/string. So according to the struct documentation, that corresponds to 2h{len(exp1)sh{len(exp1)}s.

I'm also using network endianness.

When both my expressions are under 16 bytes, the program works fine:

CLIENT INPUT:

exp1 = "123456"
exp2 = "0000"

SERVER OUTPUT:

Waiting for connection...
Connected from ('127.0.0.1', 51150).
Server received 16 bytes from ('127.0.0.1', 51150). The message was b'\x00\x02\x00\x06123456\x00\x040000'.
The length of the first expression is 6. The length of the second expression is 4.
Bye!

CLIENT OUTPUT:

Client received 123456.  # Correct

But the problem is that when the client has input greater than 16 bytes, I get an unpack error on the server side:

CLIENT INPUT:

exp1 = "123456"
exp2 = "00001"  # <---- I've now added an extra "1" here (but it could be any character).

SERVER OUTPUT:

Waiting for connections...
Connected from ('127.0.0.1', 51345).
Server received 16 bytes from ('127.0.0.1', 51345). The message was b'\x00\x02\x00\x06123456\x00\x050000'.
The length of the first expression is 6. The length of the second expression is 5.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "D:\Program Files\Python\Python37\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "D:\Program Files\Python\Python37\lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "temp-server.py", line 39, in connection_func
    unpacked_response = unpack_data(data)
  File "temp-server.py", line 21, in unpack_data
    exp2 = struct.unpack(f'!{exp2_length}s', packed_data[4 + exp1_length + 2:4 + exp1_length + 2 + exp2_length])[0]
struct.error: unpack requires a buffer of 5 bytes

Since this struct error for any input when the total length is over 16 bytes, I suspect it has to do with my buffer size. But I need to keep the buffer size for socket.rev() set at 16 bytes as a requirement. But I don't know how to read in 16 bytes at a time and unpack the struct 16 bytes at a time. I need to be able to handle expressions (exp1 and exp2) of any length (i.e. variable length strings), however; that's why I have my struct formatting to be dynamic based on the length of the expression.

I've read through this thread here, but was still unsuccessful when I tried to implement some of the ideas mentioned; namely, I still had issues unpacking the struct, and I don't have any "end" character in the protocol to indicate when all the data has been sent over the socket.

Here's my code:

server.py

import socket
import struct
import threading

HOST = "127.0.0.1"
PORT = 65432
BUFFER_SIZE = 16
NUM_RESPONSES = 2


def unpack_data(packed_data):
    exp1_length = int(struct.unpack('!h', packed_data[2:4])[0])
    exp2_length = int(struct.unpack('!h', packed_data[4 + exp1_length: 4 + exp1_length + 2])[0])
    print(f"The length of the first expression is {exp1_length}. The length of the second expression is {exp2_length}.")

    # Extract the variable-length expressions (according to their locations in the protocol)
    exp1 = struct.unpack(f'!{exp1_length}s', packed_data[4: 4 + exp1_length])[0]
    exp2 = struct.unpack(f'!{exp2_length}s', packed_data[4 + exp1_length + 2:4 + exp1_length + 2 + exp2_length])[0]

    return exp1, exp2


def connection_func(conn, addr):
    with conn:
        while True:
            data = conn.recv(BUFFER_SIZE)
            if not data:
                print("Bye!")
                break
            print(f"Server received {len(data)} bytes from {addr}. The message was {data}.")

            # Unpack the received expression.
            unpacked_response = unpack_data(data)

            # Send back the first expression.
            conn.sendall(unpacked_response[0])


def main():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind((HOST, PORT))
        s.listen()
        print("Waiting for connections...")

        conn, addr = s.accept()
        print(f"Connected from {addr}.")
        threading.Thread(target=connection_func, args=(conn, addr)).start()


main()

client.py

import socket
import struct

HOST = "127.0.0.1"
PORT = 65432
BUFFER_SIZE = 16
NUM_EXPRESSIONS = 2


def pack_expression(exp1, exp2):
    encoded_exp1 = exp1.encode('utf-8')
    encoded_exp2 = exp2.encode('utf-8')
    struct_format = f'!2h{len(encoded_exp1)}sh{len(encoded_exp2)}s'  # According to protocol

    packed_exp = struct.pack(struct_format, NUM_EXPRESSIONS, len(encoded_exp1), encoded_exp1, len(encoded_exp2), encoded_exp2)
    return packed_exp


def main():
    exp1 = "123456"
    exp2 = "0000"

    # Pack the two expressions before sending to the server.
    packed_exp = pack_expression(exp1, exp2)

    # Open a socket and send the packed info to the server.
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((HOST, PORT))
        s.sendall(packed_exp)
        data = s.recv(BUFFER_SIZE)

    print(f"Client received {data.decode('utf-8')}.")


main()

This makes no sense for me: you have the requirement that your message fits into 16 bytes but at the same time have the requirement that the parts of the message (the expressions) can be of arbitrary length. These requirements obviously contradict each other which means you cannot comply with both requirements. — Steffen Ullrich, Sep 28 '19 at 04:45
@SteffenUllrich Sorry for the confusion. I mean the buffer size (for server and client) is required to be 16 bytes. The actual expressions themselves -- that can come from the client -- can contain variable-length strings. E.g. they can be80-byte strings, or they can be 4-byte strings. But the buffer size that's used to read in the data over the socket (```socket.rev()```) is fixed at 16 bytes. See the documentation for ```socket.recv()``` [here](https://docs.python.org/3.7/library/socket.html#socket.socket.recv). — Alureon, Sep 28 '19 at 07:16
First, even while you might call `recv(16)` it is not guaranteed to return 16 bytes. the given size is only an upper bound and it will not wait until 16 bytes are available. `recv` will also not necessarily what the other side does with `send` since TCP is a stream protocol (unit is bytes) and not a message protocol.Then, your structure clearly says the sizes of the different parts. So once you have the first 4 bytes you can already extract what size `exp1` must be and can read the remaining bytes including the next 2 bytes which give you the size of `exp2`. — Steffen Ullrich, Sep 28 '19 at 08:53

Error when struct unpacking variable-length string over a socket

server.py

client.py

0 Answers0