1

I know this question has been asked before, but couldn't get it working for me though. What I want to do is sent a prefix with my message like so:

msg = pickle.dumps(message)
prefix = b'{:0>5d}'.format(len(msg))
message = prefix + msg

This gives me

AttributeError: 'bytes' object has no attribute 'format'

I tried formatting with % and encoding but none of them worked.

boortmans
  • 1,138
  • 2
  • 24
  • 40
  • `bytes` doesn't support formatting, not before Python 3.5 (not yet out). Decode to `str`, then encode again, or use rethink what you are doing. – Martijn Pieters Jun 05 '14 at 16:51

2 Answers2

5

You can't format a bytes literal. You also can't concatenate bytes objects with str objects. Instead, put the whole thing together as a str, and then convert it to bytes using the proper encoding.

msg = 'hi there'
prefix = '{:0>5d}'.format(len(msg)) # No b at the front--this is a str
str_message = prefix + msg # still a str
encoded_message = str_message.encode('utf-8') # or whatever encoding

print(encoded_message) # prints: b'00008hi there'

Or if you're a fan of one-liners:

encoded_message = bytes('{:0>5d}{:1}'.format(len(msg), msg), 'utf-8')

According your comment on @Jan-Philip's answer, you need to specify how many bytes you're about to transfer? Given that, you'll need to encode the message first, so you can properly determine how many bytes it will be when you send it. The len function produces a proper byte-count when called on bytes, so something like this should work for arbitrary text:

msg = 'ü' # len(msg) is 1 character
encoded_msg = msg.encode('utf-8') # len(encoded_msg) is 2 bytes
encoded_prefix = '{:0>5d}'.format(len(encoded_msg)).encode('utf-8')
full_message = encoded_prefix + encoded_msg # both are bytes, so we can concat

print(full_message) # prints: b'00002\xc3\xbc'
Community
  • 1
  • 1
Henry Keiter
  • 16,863
  • 7
  • 51
  • 80
  • He is dealing with a stream transport (saying "so the server knows what length will come"), so he needs to know how many bytes exactly encode the message length. Is that deterministic with your approach? Can you show how you would decode the message size from a byte stream? – Dr. Jan-Philip Gehrcke Jun 05 '14 at 16:49
  • @Jan-PhilipGehrcke Depends on his encoding, presumably. If that's the case, he's got other problems, since he's got his `msg` set up as a string, and the `len` of that isn't necessarily the number of bytes he'll be sending, as you say. – Henry Keiter Jun 05 '14 at 16:52
  • @Jan-PhilipGehrcke I've updated the answer with a stable solution for his actual use case. – Henry Keiter Jun 05 '14 at 17:02
  • Thank you both for the replies! I found this answer the most helpfull, since I didn't need to change anything on the server side. – boortmans Jun 05 '14 at 17:15
  • 1
    @Barto: just note that the custom protocol above requires 5 bytes for length encoding, with a maximum length of 99999. Using a native data type such as unsigned long (`L` format specifier for Python's `struct` methods) the maximum encodable message length is 2^32-1 (4294967295), i.e. 5 order of magnitudes larger, within 4 bytes only, i.e. one byte less than in the approach above. In real-world protocols you would use the latter approach in order to optimize the ratio between protocol overhead and actual payload. – Dr. Jan-Philip Gehrcke Jun 06 '14 at 12:15
1

Edit: I think I misunderstood your question. Your issue is that you can't get the length into a bytes object, right?

Okay, you would usually use the struct module for that, in this fashion:

struct.pack("!i", len(bindata)) + bindata

This writes the length of the (binary!) message into a four byte integer object. The return value of pack() is this object (of type bytes). For decoding this on the receiving end you need to read exactly the first 4 bytes of your message into a bytes object. Let's call this first_four_bytes. Decoding is done using struct.unpack, using the same format specifier (!i) in this case:

messagesize, = struct.unpack("!i", first_four_bytes)

Then you know exactly how many of the following bytes belong to the message: messagesize. Read exactly that many bytes, and decode the message.

Old answer:

In Python 3, the __add__ operator returns what we want:

>>> a = b"\x61"
>>> b = b"\x62"
>>> a + b
b'ab'
Dr. Jan-Philip Gehrcke
  • 33,287
  • 14
  • 85
  • 130