1

I have a program which takes a string, makes a list with its byte representation, and then converts the list back to a string. This is really easy if the string contains only ASCII characters:

def messagetobitlist(message):

bitlist = []

for i in message:
    for x in (format(ord(i), '08b')):
        bitlist.append(int(x))
return bitlist

And then I simply convert it back with unichr (or also chr would work).

I want however to expand the code and make it capable of accepting string with accents and foreign characters. To do this I though of encoding it in UTF-8 and creating the bitlist, but when I try to convert it back it doesn't work, since the characters are represented with a different number of bytes and the code is not capable of distinguishing beforehand if it has to read just one byte or more. I tried to encode every character with 4 bytes (since it is the maximum of UTF-8), but this really does seem a waste of space and it doesn't work anyway.

Is there a solution to have a function that does this while still being somewhat space-conservative?

EDIT: Whoops, wrote Python 3 instead of Python 2.7

anthony sottile
  • 61,815
  • 15
  • 148
  • 207
  • 1
    Possible duplicate of [How to convert between bytes and strings in Python 3?](http://stackoverflow.com/questions/14010551/how-to-convert-between-bytes-and-strings-in-python-3) – SparkAndShine Dec 07 '16 at 17:13
  • I wrote Python 3 instead of Python 2.7, so the answer doesn't work for me, it gives a `UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)` – Francesco Carzaniga Dec 07 '16 at 17:22

0 Answers0