I need to decode a byte string into unicode, and insert it into a unicode string (python 2.7). When I later encode that unicode string back into bytes, the byte array must be equal to the original bytes. My question is which encoding I should use to achieve this.
Example:
#every possible byte
byteString = b"".join([chr(ii) for ii in xrange(256)])
unicodeString = u"{0}".format(byteString.decode("ascii"))
backToBytes = unicodeString.encode("ascii")
assert byteString==backToBytes
This fails with the infamous:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 128: ordinal not in range(128)
What encoding should I use here (instead of 'ascii') to preserve my byte values?
I am using "ascii" in this (currently broken) example, because it is my default encoding:
>>> import sys
>>> sys.getdefaultencoding()
'ascii'