4

In German text, umlauts (ä, ü, ö) and eszett (ß) are regular letters, but they don't seem to be covered by the \w special character:

In [1]: re.match('(\w+)', 'Straße').groups()
Out[1]: ('Stra',)

Passing the re.UNICODE flag to re.match doesn't change anything.

Is there any better way to match a full word other than with [a-zA-ZäüöÄÜÖß]+?

unwind
  • 391,730
  • 64
  • 469
  • 606
elpres
  • 416
  • 5
  • 12

1 Answers1

7

Since you are using python 2, you need to use unicode strings:

print re.match(ur'(\w+)',u'Straße',re.UNICODE).groups()[0]
Straße
Keozon
  • 998
  • 10
  • 25