python re any unicode word except if it contains digits

Question

I need to match a string of words such as "Miguel Tiago", but do not want strings that may contain numbers. For example I do not want strings such as "Miguel10 Tiago". Also all strings may contain unicode.

I can't simple do:

re.match(ur'^[a-zA-Z ]+',string,re.UNICODE)

because for such case words with unicode such as 'ç' won't be recognized.

How can I use the rule [\w ]+ and exclude the digits?

Actually, it is a dupe. `r'(?:(?!\d)[\w ])+'` will also work for you, see [this answer](https://stackoverflow.com/a/12349464/3832970). — Wiktor Stribiżew, Nov 14 '17 at 11:46
But that case it will return for the example `Miguel10 Tiago` the list `['Miguel','Tiago']`, which is something I do not want. — Miguel, Nov 14 '17 at 11:47

Roars · Answer 1 · 2017-11-14T12:43:03.970

0

I think the solution you are looking for is: (([^\W\d]+(?![\w\d])\s?)+)

edited Nov 14 '17 at 12:43

answered Nov 14 '17 at 11:29

Roars

623
5
17

Thanks and sorry but I need to edit the question to me more clear. – Miguel Nov 14 '17 at 11:40
Ok, I edit the question. Also added a detail I forgot to put, which the need to recognize more that one word. – Miguel Nov 14 '17 at 11:45
Surely in this case you could just do something with the `\D` (not digit) character class. Just to be clear, you don't want it to be able to include symbols just unicode characters. – Roars Nov 14 '17 at 11:46
I believe this should work: `([^\W\d]+(?![\w\d]))`. Let me know and I will write it up properly with an explanation. This for multiple words: `(([^\W\d]+(?![\w\d]))\s)*` – Roars Nov 14 '17 at 12:01
Yes, I think that's it. The only thing now is that the multiple words is not working: `re.match(ur'(([^\W\d]+(?![\w\d]))\s)*',u'Miguel Tiago',re.UNICODE).group()` gives `Miguel ` – Miguel Nov 14 '17 at 12:05
Try `(([^\W\d]+(?![\w\d])\s?)+)` realised that you need to specify that there may or may not be a space after it. – Roars Nov 14 '17 at 12:40
I've realised there is a flaw in this, that if the numbers appear within the word this won't work as expected – Roars Nov 14 '17 at 12:49

python re any unicode word except if it contains digits

1 Answers1