Using Python 2.7, re
I'm trying to compile unicode character classes. I can get it to work with 4 digit ranges (u'\uxxxx') but not 8 digits (u'\Uxxxxxxxx').I
The following works:
re.compile(u'[\u0010-\u0012]')
The following does not:
re.compile(u'[\U00010000-\U00010001]')
The resultant error is:
Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\re.py", line 190, in compile return _compile(pattern, flags) File "C:\Python27\lib\re.py", line 242, in _compile raise error, v # invalid expression error: bad character range
It appears to be an issue with 8 digit ranges only as the following works:
re.compile(u'\U00010000')
Separate question, I am new to stackoverflow and I am really struggling with how to post questions. I would expect that Trackback to appear on multiple lines, not on one line. I would also like to be able to paste in content copied from the interpreter but this UI makes a mess out of '>>>'
Don't know how to add this in a comment editing question.
The expression I really want to compile is:
re.compile(u'[\U00010000-\U0010FFFF]')
Expanding it with list(u'[\U00010000-\U0010FFFF]') looks pretty intractable as far as extending the suggested workaround:
>>> list(u'[\U00010000-\U0010FFFF]')
[u'[', u'\ud800', u'\udc00', u'-', u'\udbff', u'\udfff', u']']