A string variable sometimes includes octal characters that need to be un-octaled. Example: oct_var = "String\302\240with\302\240octals"
, the value of oct_var
should be "String with octals"
with non-breaking spaces.
Codecs doesn't support octal, and I failed to find a working solution with encode()
. The strings originate upstream outside my control.
Python 3.9.8
Edited to add: It doesn't have to scale or be ultra fast, so maybe the idea from here (#6) can work (not tested yet):
def decode(encoded):
for octc in (c for c in re.findall(r'\\(\d{3})', encoded)):
encoded = encoded.replace(r'\%s' % octc, chr(int(octc, 8)))
return encoded.decode('utf8')