Create raw unicode character from hex string representation/enter single backslash

Question

I want to create a raw unicode character from a string hex representation. That is, I have a string s = '\u0222' which will be the 'Ȣ' character.

Now, this works if I do

>>> s = '\u0222'
>>> print(s)
'Ȣ'

but, if I try to do concatenation, it comes out as

>>> h = '0222'
>>> s = r'\u' + '0222'
>>> print(s)
\u0222
>>> s
'\\u0222'

because as it can be seen, what's actually in string is '\\u' not '\u'. How can I create the unicode character from hex strings or, how can I enter a true single backslash?

score 3 · Answer 1 · answered May 21 '19 at 18:18

3

This was a lot harder to solve than I initially expected:

code = '0222'
uni_code = r'\u' + code
s = uni_code.encode().decode('unicode_escape')
print(s)

Or

code = b'0222'
uni_code = b'\u' + code
s = uni_code.decode('unicode_escape')
print(s)

answered May 21 '19 at 18:18

Error - Syntactical Remorse

7,468
4
24
45

2

Agreed, much harder than it looked. You beat me to it while i was testing! For OP, more detail and examples can be found in [this answer](https://stackoverflow.com/a/49754538/7835267) – G. Anderson May 21 '19 at 18:25
It's harder because building Unicode escape constants is not the most direct route. See the `chr()` function. – Mark Tolonen May 21 '19 at 20:47

Mark Tolonen · Accepted Answer · 2019-05-21T20:49:26.593

2

Entering \u0222 is only for string constants and the Python interpreter generates a single Unicode code point for that syntax. It's not meant to be constructed manually. The chr() function is used to generate Unicode code points. The following works for strings or integers:

>>> chr(int('0222',16)) # convert string to int base 16
'Ȣ'
>>> chr(0x222)          # or just pass an integer.
'Ȣ'

And FYI ord() is the complementary function:

>>> hex(ord('Ȣ'))
'0x222'

edited May 21 '19 at 20:49

answered May 21 '19 at 20:43

Mark Tolonen

166,664
26
169
251

1

Much better answer. I figured there was something that did this but I didn't know what it was. – Error - Syntactical Remorse May 21 '19 at 20:48

Create raw unicode character from hex string representation/enter single backslash

2 Answers2