# function to split an iterable into evenly-sized chunks
def chunk(iterable, size):
idx = 0
while idx < len(iterable):
yield iterable[idx:idx+size]
idx += size
# define the original string
orig_string = "003300340035"
# convert to string of codepoints
unicode_str = "".join(chr(int(codepoint, 16)) for codepoint in chunk(orig_string, 4))
print(unicode_str)
# 345
That last line has several steps going on. To clarify:
- Separate the original string into chunks of 4 characters and iterate over them (
for codepoint in chunk(orig_string, 4)
)
- Convert each four-character string into an integer, assuming it's in base-16 (
int(codepoint, 16)
)
- Get the unicode character with the given integer codepoint (
chr()
)
- Join all the individual unicode characters back into a string (
"".join()
)
It'll also only work if your code is exclusively 4-character unicode codepoints. But detecting such things, if they're mixed in, is a separate problem for a separate question.