In German text, umlauts (ä, ü, ö) and eszett (ß) are regular letters, but they don't seem to be covered by the \w
special character:
In [1]: re.match('(\w+)', 'Straße').groups()
Out[1]: ('Stra',)
Passing the re.UNICODE
flag to re.match
doesn't change anything.
Is there any better way to match a full word other than with [a-zA-ZäüöÄÜÖß]+
?