I'm looking for the equivalent of [\w]&&[^\d]
(Of course && is not a regex operator).
The regex needs to match ONLY words made up of UTF8 "alphabet" characters. Does anyone have any ideas?

- 11,757
- 4
- 41
- 57
-
3NEVER perform regexs on encoded text. – Ignacio Vazquez-Abrams Apr 03 '12 at 06:07
-
3http://stackoverflow.com/questions/8923949/matching-only-a-unicode-letter-in-python-re – warvariuc Apr 03 '12 at 06:11
-
1Are you talking about the English alphabet? Then the answers [a-zA-Z] below will suffice. Otherwise you're in for a treat... – Jonas Byström Apr 03 '12 at 06:12
-
"NEVER perform regexs on encoded text." This for internationalized URL matching. Not longform text. – Thomas Apr 03 '12 at 07:18
-
@IgnacioVazquez-Abrams "NEVER perform regexs on encoded text." How come there is an re.UNICODE flag then? I guess things break for you when you're not using that flag. – bpj Jul 09 '16 at 13:32
-
@bpj: `re.UNICODE` doesn't make `re` work on encoded text, it makes various special sequences match non-ASCII characters. – Ignacio Vazquez-Abrams Jul 09 '16 at 14:45
5 Answers
regex
supports Unicode properties, which means that you can use \p{L}
with it.

- 776,304
- 153
- 1,341
- 1,358
As Ignacio pointed out [a-zA-Z]
would not match Unicode characters, and there is no character class predefined for all Unicode characters, you may want to use something similar to the following, which would be simple and straightforward
re.findall("(["+string.letters+"])+",st)
Please note, string.letters is locale dependent and unless you want to switch the local, which you can off-course do with locale.setlocale(locale.LC_CTYPE, code)
, this should work as a breeze.

- 62,056
- 18
- 131
- 204
AFAICT, there isn't a regex that matches all letters but not digits or underscores.
You could use \w
and then check to see if the matches are letters using the code point properties:
def isletter(c):
return unicodedata.category(c).startswith('L')

- 216,523
- 63
- 388
- 485
Not sure about regex, but for unicode you might be able to make use of the uncodedata
module; specifically the unicodedata.category()
function

- 8,417
- 28
- 36
Use [a-zA-Z] to match all the alphabet characters.

- 439
- 5
- 13
-
-
-
11Yes it is. It's just not an English alphabet character. – Ignacio Vazquez-Abrams Apr 03 '12 at 06:21
-
1