I have a filename that contains %ed%a1%85%ed%b7%97.svg
and want to decode that to its proper string representation in Python 3. I know the result will be .svg
but the following code does not work:
import urllib.parse
import codecs
input = '%ed%a1%85%ed%b7%97.svg'
unescaped = urllib.parse.unquote(input)
raw_bytes = bytes(unescaped, "utf-8")
decoded = codecs.escape_decode(raw_bytes)[0].decode("utf-8")
print(decoded)
will print ������.svg
. It does work, however, when input
is a string like %e8%b7%af.svg
for which it will correctly decode to 路.svg
.
I've tried to decode this with online tools such as https://mothereff.in/utf-8 by replacing %
with \x
leading to \xed\xa1\x85\xed\xb7\x97.svg
. The tool correctly decoded this input to .svg
.
What happens here?