I have a program which takes a string, makes a list with its byte representation, and then converts the list back to a string. This is really easy if the string contains only ASCII characters:
def messagetobitlist(message):
bitlist = []
for i in message:
for x in (format(ord(i), '08b')):
bitlist.append(int(x))
return bitlist
And then I simply convert it back with unichr (or also chr would work).
I want however to expand the code and make it capable of accepting string with accents and foreign characters. To do this I though of encoding it in UTF-8 and creating the bitlist, but when I try to convert it back it doesn't work, since the characters are represented with a different number of bytes and the code is not capable of distinguishing beforehand if it has to read just one byte or more. I tried to encode every character with 4 bytes (since it is the maximum of UTF-8), but this really does seem a waste of space and it doesn't work anyway.
Is there a solution to have a function that does this while still being somewhat space-conservative?
EDIT: Whoops, wrote Python 3 instead of Python 2.7