ValueError: unichr() arg not in range(0x10000) (narrow Python build)

Question

I am trying to convert the html entity to unichar, the html entity is 󮠖 when i try to do the following:

unichr(int(976918))

I got error that:

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

seems like it is out of range conversion for unichar.

Eryk Sun · Accepted Answer · 2014-07-27T14:50:18.333

29

You can decode a string that has a Unicode escape (\U followed by 8 hex digits, zero-padded) using the "unicode-escape" encoding:

>>> s = "\\U%08x" % 976918
>>> s
'\\U000ee816'

>>> c = s.decode('unicode-escape')
>>> c
u'\U000ee816'

On a narrow build it's stored as a UTF-16 surrogate pair:

>>> list(c)
[u'\udb7a', u'\udc16']

This surrogate pair is processed correctly as a code unit during encoding:

>>> c.encode('utf-8')
'\xf3\xae\xa0\x96'

>>> '\xf3\xae\xa0\x96'.decode('utf-8')
u'\U000ee816'

edited Jul 27 '14 at 14:50

answered Aug 18 '11 at 12:21

Eryk Sun

33,190
5
92
111

To convert 976918 to 000ee816 do `hex(976918)[2:].zfill(8)` – EoghanM Jul 27 '14 at 11:38

score 13 · Answer 2 · answered Feb 04 '15 at 16:40

13

Here's an alternate workaround that I developed with the struct module.

def unichar(i):
    try:
        return unichr(i)
    except ValueError:
        return struct.pack('i', i).decode('utf-32')

>>> unichar(int('976918'))
u'\U000ee816'

answered Feb 04 '15 at 16:40

Mark Ransom

299,747
42
398
622

score 6 · Answer 3 · answered Aug 18 '11 at 10:25

In order for this to work, you either need to build Python yourself, specifying

./configure --enable-unicode=ucs4

before compiling, or else you need to move to Python 3.

Even if you do this, there are apparently problems on Windows, which will be fixed in the next version of Python (3.3).

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

3 Answers3

Linked