Problem formulation/example
Consider the latin character á
, which can be represented as
\xe1
in hex\u00e1
in 16-bit hex\U000000e1
in 32-bit hex
In the following code block, I'm decomposing the latin-1 character into an equivalent character with the accent removed (i.e. from á
to a
):
import unicodedata
decomposed = unicodedata.normalize('NFD', '\xe1')
encoded = decomposed.encode("utf-8")
letter = chr(list(encoded)[0])
print(letter)
(Any of the three bullet-pointed formats could have been used in the second argument of unicodedata.normalize()
.)
My issue
My issue is in trying to generalise this, whereby the second argument to normalize()
is to be an assigned variable.
I'm struggling to do this without explicitly entering the string into the formula because of the escaped backslash.
Example attempt
latin = "á"
a = ascii(latin) # print(a) gives '\xe1'
decomposed = unicodedata.normalize('NFD', a)
encoded = decomposed.encode("utf-8")
letter = chr(list(encoded)[0])
This won't work because the argument a
is interpreted as '\\xe1'
instead of '\xe1'
.
Other attempts to get the hex representation and construct a string by concatenating \x
to it won't work either, for the same reason.