0

I have a asyncio server, which is an example from the TCP Doc. However I'm connecting to it using pyzmq and when the reader on the server tries to read I get a decode error. Any hint is highly appreciated. I've already tried encoding to utf-8 first, didn't help.

Server: (Python 3.6)

import asyncio

async def handle_echo(reader, writer):
    data = await reader.read(100)
    print(data)
    message = data.decode()


loop = asyncio.get_event_loop()
coro = asyncio.start_server(handle_echo, '127.0.0.1', 5555, loop=loop)
server = loop.run_until_complete(coro)
loop.run_forever()

Client: (Python 2.7)

import zmq
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect ("tcp://localhost:%s" % 5555)
socket.send("test")

Full Trace:

    future: <Task finished coro=<handle_echo() done, defined at "E:\Projects\AsyncIOserver.py:3> exception=UnicodeDecodeError('utf-8', b'\xff\x00\x00\x00\x00\x00\x00\x00\x01\x7f', 0, 1, 'invalid start byte')>
Traceback (most recent call last):
  File "E:\Projects\AsyncIOserver.py", line 6, in handle_echo
    message = data.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
user1767754
  • 23,311
  • 18
  • 141
  • 164

2 Answers2

4

Zeromq uses the ZMTP protocol. It is a binary protocol so you won't be able to decode it directly.

If you're curious about it, check the ZMTP frames using wireshark and the ZMTP plugin:

Wireshark + ZMTP

You can see that the bytes you got actually corresponds to the greeting message signature.


In order to receive the messages from a ZMQ socket in asyncio, use a dedicated project like aiozmq:

import aiozmq
import asyncio

async def main(port=5555):
    bind = "tcp://*:%s" % port
    rep = await aiozmq.create_zmq_stream(aiozmq.zmq.REP, bind=bind)
    message, = await rep.read()
    print(message.decode())
    rep.write([message])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    loop.close()
Vincent
  • 12,919
  • 1
  • 42
  • 64
3

The byte ff is the first byte of a little-endian UTF-16 BOM, it has no place in a UTF-8 stream, where the maximum number of 1-bits at the start of a codepoint is four.

See an earlier answer of mine for more detail on the UTF-8 encoding.

As to fixing it, you'll need to receive what was sent. That will involve either fixing the transmission side to do UTF-8, or the reception side to do UTF-16.

You may want to look into the differences between strings in Python 2 and 3, this may well be what's causing your issue (see here).

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Thanks for the hint's, I'm trying to read through to make it work. I was thinking if I send a string as a binary `b"test"` I should be able to decode it in `python3`, but doesn't seem to work out of the box. – user1767754 Jan 29 '18 at 01:04
  • Hmm that's weird, just to prune at error sources, I'm now using for both client/server `python3` and using the `send_string` from pyzmq. Getting the same error message. – user1767754 Jan 29 '18 at 01:40
  • It seems that the `ff` byte was not related to UTF-16 BOM after all. – user4815162342 Jan 29 '18 at 19:14