The problem is probably due to using a "narrow build" of Python 2. That is, if you fire up your interpreter, you'll find that sys.maxunicode == 0xffff
is True
.
This site has a few interesting notes on wide builds of Python (which are commonly found on Linux, but not, as the link suggests, on OS X in my experience). These builds use UCS-4 internally to encode characters, and as a result seem to have saner support for higher range Unicode code points, such as the ranges you are talking about. Narrow builds apparently use UTF-16 internally, and as a result encode these higher code points using "surrogate pairs". I presume this is the reason you see a bad character range
error when you try and compile this regular expression.
The only solution I know is to switch to a python version >= 3.3 which no longer has the wide/narrow distinction if you can, or install a wide Python build