Possible Duplicate:
matching unicode characters in python regular expressions
Using
re.findall(r'\w+', ip)
on Fältskog
returns F
and ltskog
. I tried with both strings and unicode but the same. result
Possible Duplicate:
matching unicode characters in python regular expressions
Using
re.findall(r'\w+', ip)
on Fältskog
returns F
and ltskog
. I tried with both strings and unicode but the same. result
You need to set the appropriate flags (in this case UNICODE
to tell re
what \w
means):
re.findall(r'\w+', ip, re.UNICODE)
# EDIT
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall(r"\w+", u"Fältskog", re.UNICODE)
[u'F\xe4ltskog']
>>>
re.findall(r'[åäöÅÄÖ\w]+', ip)
You can also do this if you want to be more visual.