TLDR version:
I know that Python 3 stores strings in Unicode format by default, whereas Python 2.6+ stores them as byte sequences.
# test.py
a = "\xEF\xEB"
print(a)
$ python test.py | hexdump -C
00000000 ef eb 0a
$ python3 test.py | hexdump -C
00000000 c3 af c3 ab 0a
I need to make the strings in Python 3 code to be exactly like the ones in Python 2 (i.e., containing the exact bytes as the original string without the Unicode conversion).
Longer version:
I was in the process of migrating some web server code from Python 2 to 3 and encountered a significant but hopefully easy-to-solve problem. As an example:
# test.py
from struct import pack
port = pack( '>H', 5100 ).decode( 'ISO-8859-1' )
print(port)
$ python3 test.py | hexdump -C
00000000 13 c3 ac 0a
# test.py
from struct import pack
port = pack( '>H', 5100 )
print(port)
$ python test.py | hexdump -C
00000000 13 ec 0a
The Unicode format in which Python 3 stores strings is causing a huge problem for my apps because they were written with predetermined offsets (i.e., certain bytes are expected to be at certain places) and those offsets are now thrown off by the Unicode characters.
Is there a way to convert a Python 3 string into a "regular" string like what we had in Python 2 so that a string "\xEF\xEB" will be treated as exactly that?