2

I want to create a raw unicode character from a string hex representation. That is, I have a string s = '\u0222' which will be the 'Ȣ' character.

Now, this works if I do

>>> s = '\u0222'
>>> print(s)
'Ȣ'

but, if I try to do concatenation, it comes out as

>>> h = '0222'
>>> s = r'\u' + '0222'
>>> print(s)
\u0222
>>> s
'\\u0222'

because as it can be seen, what's actually in string is '\\u' not '\u'. How can I create the unicode character from hex strings or, how can I enter a true single backslash?

John Doenut
  • 185
  • 3
  • 11

2 Answers2

3

This was a lot harder to solve than I initially expected:

code = '0222'
uni_code = r'\u' + code
s = uni_code.encode().decode('unicode_escape')
print(s)

Or

code = b'0222'
uni_code = b'\u' + code
s = uni_code.decode('unicode_escape')
print(s)
  • 2
    Agreed, much harder than it looked. You beat me to it while i was testing! For OP, more detail and examples can be found in [this answer](https://stackoverflow.com/a/49754538/7835267) – G. Anderson May 21 '19 at 18:25
  • It's harder because building Unicode escape constants is not the most direct route. See the `chr()` function. – Mark Tolonen May 21 '19 at 20:47
2

Entering \u0222 is only for string constants and the Python interpreter generates a single Unicode code point for that syntax. It's not meant to be constructed manually. The chr() function is used to generate Unicode code points. The following works for strings or integers:

>>> chr(int('0222',16)) # convert string to int base 16
'Ȣ'
>>> chr(0x222)          # or just pass an integer.
'Ȣ'

And FYI ord() is the complementary function:

>>> hex(ord('Ȣ'))
'0x222'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251