0

Possible Duplicate:
matching unicode characters in python regular expressions

Using

re.findall(r'\w+', ip)

on Fältskog returns F and ltskog. I tried with both strings and unicode but the same. result

Community
  • 1
  • 1
Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202
  • 2
    You need to specify the re.LOCALE and re.UNICODE flags. (If you want to depend on the current locale, otherwise, re.UNICODE will match all alphanumeric in all languages). – nhahtdh Sep 22 '12 at 07:01

2 Answers2

5

You need to set the appropriate flags (in this case UNICODE to tell re what \w means):

re.findall(r'\w+', ip, re.UNICODE)

# EDIT

Python 2.7.3 (default, Aug  1 2012, 05:16:07) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall(r"\w+", u"Fältskog", re.UNICODE)
[u'F\xe4ltskog']
>>> 
Sean Vieira
  • 155,703
  • 32
  • 311
  • 293
0

re.findall(r'[åäöÅÄÖ\w]+', ip)

You can also do this if you want to be more visual.

Pablo Jomer
  • 9,870
  • 11
  • 54
  • 102