Using python 2.7
I am trying to write a regex that can recognize any utf-8 number 0-9 (not just arabic numerals, but simplified chinese as well) and any unicode word character.
For example I have:
4_1424336,P-九
(九 is chinese 9).
And I want to return:
9_9999999,A-9
My current function is:
def multiple_replace(myString):
myString = re.sub(ur'(?u)[^\W_*\d]', u'A', myString)
myString = re.sub(ur'(?u)[\d]', u'9', myString)
return myString
EDITED:
Also tried...same result
def multiple_replace(myString):
myLetters_regex = re.compile(r'[^\W\d_]', re.UNICODE)
myNumbers_regex = re.compile(r'[\d]', re.UNICODE)
myString = myNumbers_regex.sub('9', myString)
myString = myLetters_regex.sub('A', myString)
return myString
and I get...
9_9999999,A-A (i.e. 九 is recognized is flagged as an 'A' instead of a '9')
So, my q's are:
1) Is there any other way to write the \W to NOT include the numerics in the alphanumerics?
2) Is there something I am missing about recognizing Chinese numerals using python regex?