I am reading regular expression from a file and generally have had no problems until this one came along:
^X.{0,2}[\u2E80-\u9FFF] # \u2E80-\u9FFF matches most Chinese and Japanese characters
The regex works fine when compiled internally:
p = re.compile(u'^X.{0,2}[\u2E80-\u9FFF]', re.IGNORECASE | re.UNICODE)
print p.search(u'XFlowers for you')
>> none
print p.search(u'X桜桜桜桜')
>> <match object>
but the character range specifier is apparently garbled in the import process as it matches just about anything starting with X thereafter:
f = codecs.open(filename, "r", "utf-8")
lines = f.read().splitlines()
filePatterns = FileHelper.fileToList(ignoreFile)
patternList = [re.compile(x, re.IGNORECASE | re.UNICODE) for x in ignorePatterns]
for name in [u'XFlowers for you', u'X桜桜桜桜']
for pattern in patternList:
print pattern.search(name):
This will match both strings.
Anyone know how to solve this on? Thanks!