5

Two python interpreter sessions. The first is from python on CentOS. The second is from the built-in python on Mac OS X 10.7. Why does the second session create strings of length two from the \U escape sequence, and subsequently error out?

$ python
Python 2.6.6 (r266:84292, Dec  7 2011, 20:48:22) 
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u'\U00000020'
u' '
>>> u'\U00000065'
u'e'
>>> u'\U0000FFFF'
u'\uffff'
>>> u'\U00010000'
u'\U00010000'
>>> len(u'\U00010000')
1
>>> ord(u'\U00010000')
65536

$ python
Python 2.6.7 (r267:88850, Jul 31 2011, 19:30:54) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
>>> u'\U00000020'
u' '
>>> u'\U00000065'
u'e'
>>> u'\U0000FFFF'
u'\uffff'
>>> u'\U00010000'
u'\U00010000'
>>> len(u'\U00010000')
2
>>> ord(u'\U00010000')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found
audiodude
  • 1,865
  • 16
  • 22

1 Answers1

4

I'm not at all sure about this, but it may be that your Mac OS X system uses a "narrow build" of python that represents unicode with only 16 bits for internal encoding of unicode, and represents the unicode code points above 2**16 as a character pair (which would explain len(u'\U00010000') == 2.

Try unichr(0x10000) on OS X and see if you get an error referring to narrow builds. See also What encoding do normal python strings use?, in particular IVH's answer.

It's possible to recompile python to use a wide build even if the default python on your system uses a narrow build.

Community
  • 1
  • 1
Justin Blank
  • 1,768
  • 1
  • 15
  • 32
  • 1
    Good catch. That's probably it. See this article too: http://wordaligned.org/articles/narrow-python – dda Jun 07 '12 at 04:27
  • This is the right answer. I get the error about "narrow Python build" and sys.maxunicode returns 65535 on Mac OS X. – audiodude Jun 07 '12 at 14:32
  • 1
    @user802500: I might be misunderstanding, but isn't it Mac OS that has the narrow build in this case? – Nails N. Jun 07 '12 at 15:05
  • You're right. I'd flipped which OS was doing what when I was answering the post. It's edited now. – Justin Blank Jun 07 '12 at 15:37