0

So I have the following string

"%E3%83%9C%E3%83%89%E3%82%AB%E3%81%95%E3%82%93"

It actually means this

ボドカさん

This string seems to be encoded in UTF-8 because when I write this in python

encoded_str = b'\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93'
print(encoded_str)
print(encoded_str.decode('utf-8'))

Here is the output I get

b'\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93'
ボドカさん

But now I would like a script that will allow me to decode any string in the initial format and here is my code.

import re
import os

mystr = "%E3%83%9C%E3%83%89%E3%82%AB%E3%81%95%E3%82%93"
mystr = mystr.lower()
mystr = re.sub('%', r'\\x', mystr)
encoded_str = bytes(mystr, "utf-8")

print(mystr)
print(encoded_str)
print(encoded_str.decode('utf-8'))

Output:

\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93
b'\\xe3\\x83\\x9c\\xe3\\x83\\x89\\xe3\\x82\\xab\\xe3\\x81\\x95\\xe3\\x82\\x93'
\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93

I tried so many possibilities but I couldn't find the right way to encode proprely my string like the b'STRING' thing would do. I always get extra \ characters from the encoding process that then spoil the decoding process too.

I tried all the encoding methods existing in python for the bytes() function.

I need help please. Thank you. Stack overflow banned me for that question lol

Mash
  • 39
  • 5

1 Answers1

1
mystr = "%E3%83%9C%E3%83%89%E3%82%AB%E3%81%95%E3%82%93"
encoded_str = bytes.fromhex(mystr.replace('%', ''))
print(encoded_str.decode('utf-8'))

Output:

ボドカさん
Crapicus
  • 214
  • 2
  • 9
  • Thank you so much, I didn t know about the format expected to decode utf-8 strings. – Mash Sep 07 '22 at 13:23