Regex Get All Alphabetic characters

Question

I want something like [A-z] that counts for all alphabetic characters plus stuff like ö, ä, ü etc.

If i do [A-ü] i get probably all special characters used by latin languages but it also allows other stuff like ¿¿]|{}[¢§øæ¬°µ©¥

Example: https://regex101.com/r/tN9gA5/2

Edit: I need this in python2.

score 4 · Answer 1 · answered Apr 08 '15 at 07:33

Depending on what regular expression engine you are using, you could use the ^\p{L}+$ regular expression. The \p{L} denotes a unicode letter:

In addition to complications, Unicode also brings new possibilities. One is that each Unicode character belongs to a certain category. You can match a single character belonging to the "letter" category with \p{L}

Source

This example should illustrate what I am saying. It seems that the regex engine on Regex101 does support this, you just need to select PCRE (PHP) fromo the top left.

this is probably the best idea but i am using python and the standard regex library does not support this. — yamm, Apr 08 '15 at 09:04

score 1 · Accepted Answer · edited May 23 '17 at 12:28

When you use [A-z], you are not only capturing letters from "A" to "z", you also capture some more non-letter characters: [ \ ] ^ _ `.

In Python, you can use [^\W\d_] with re.U option to match Unicode characters (see this post).

Here is a sample based on your input string.

Python example:

import re
r = re.search(
    r'(?P<unicode_word>[^\W\d_]*)',
    u'TestöäüéàèÉÀÈéàè',
    re.U
)

print r.group('unicode_word')
>>> TestöäüéàèÉÀÈéàè

Regex Get All Alphabetic characters

2 Answers2