Using Python3 to minimize the pain when dealing with Unicode, I can print a UTF-8 character as such:
>>> print (u'\u1010')
တ
But when trying to do the same with UTF-16, let's say U+20000
, u'\u20000'
is the wrong way to initialize the character:
>>> print (u'\u20000')
0
>>> print (list(u'\u20000'))
['\u2000', '0']
It reads a 2 UTF-8 characters instead.
I've also tried the big U, i.e. u'\U20000'
, but it throws some escape error:
>>> print (u'\U20000')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: truncated \UXXXXXXXX escape
Big U outside the string didn't work too:
>>> print (U'\u20000')
0
>>> print (U'\U20000')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: truncated \UXXXXXXXX escape