I am currently trying to figure out how to use Unicode in a regex in Python.
The regex I want to get to work is the following:
r"([A-ZÜÖÄß]+\s)+"
This should include all occurences of multiple capitalized words, that may or may not have Umlauts in them. Funnily enouth it will do nearly what I wanted, but it still ignores Umlauts.
For example, in FUßBALL AND MORE
only BALL AND MORE
should be detected.
I already tried to simply use the Unicode representations (Ü
becomes \u00DC
etc.), as it was advised in another thread, but that does not work too. Instead I might try to use the "regex" library instead of "re", but I kindoff want to know what I am doing wrong right now.
If you are able to enlighten me, please feel free to do so.