1

I've read a great solution for unicode strings here, but I need to check entire string to be letters or spaces or dashes and I can't think of any solution. The example is not working as I want.

name = u"Василий Соловьев-Седой"
r = re.compile(r'^([\s\-^\W\d_]+)$', re.U)
r.match(name) -> None
Community
  • 1
  • 1
Stephan Olmer
  • 93
  • 1
  • 1
  • 8
  • 1
    Please define exactly what you mean by "letters", "spaces", and "dashes". –  Feb 03 '12 at 11:18
  • letters - any unicode letters, spaces - space :), dashes - "-" symbol – Stephan Olmer Feb 03 '12 at 11:19
  • 2
    I think he means [a-zA-Z[UNICODE_LETTERS] -]*, the problem here is [UNICODE_LETTERS] right ? – Eregrith Feb 03 '12 at 11:20
  • @Eregrith, no. To check only unicode letters in string it is enough a r = re.compile(r'[^\W\d_]', re.U). It will work for string< for example, u"Василий", but will not work for u"Василий Соловьев-Седой" . So I add "\s\-" to regex string, but it does not matching my string at all. I think there is error in my regex string, but I don't know where. – Stephan Olmer Feb 03 '12 at 11:27

2 Answers2

4
r = re.compile(r'^(?:[^\W\d_]|[\s-])+$', re.U)

[^\W\d_] matches any letter (by matching any alphanumeric character except for digits and underscore).

[\s-] of course matches whitespace and dashes.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Did you test this on string u"Василий Соловьев-Седой"? I've got None in result. – Stephan Olmer Feb 03 '12 at 11:29
  • @Tim in researching this I found your other answer http://stackoverflow.com/questions/1716609/how-to-match-cyrillic-characters-with-a-regular-expression can you explain the difference? Is it that the re.U allows the word token \W to act the same was as the \p unicode token? – David Hall Feb 03 '12 at 11:31
  • My mistake. I test regex on wrong string. @Tim your solution works perfect. Thanks. – Stephan Olmer Feb 03 '12 at 11:36
  • @DavidHall: In a way, yes. Python doesn't support Unicode properties directly, but the `\w` shortcut is Unicode-aware (if you use `re.U`). – Tim Pietzcker Feb 03 '12 at 13:39
0

if you ONLY want to check:

name = u"Василий Соловьев-Седой";
name = name.replace("-","").replace(" ",""); 
name.isalpha()
psola
  • 23
  • 2