How to check if a string is word with regex?

Question

I want to check if a string (like 'hello') input by user only contains one word and nothing else. Like only true for those contains only [a-zA-Z] and no whitespace or dot or underscore or another word.

For example:

'hello' true
'hello_' false
'hello world' false
'h.e.l.l.o' false

I don't know how to write the regex. Need help.

You wrote it in your question, just add `+` to it: `[a-zA-Z]+`. You may also want to anchor it: `^[a-zA-Z]+$`. — ctwheels, Jan 09 '18 at 15:53
Possible duplicate of [Allow Only Alpha Characters in Python?](https://stackoverflow.com/questions/12382696/allow-only-alpha-characters-in-python) — ctwheels, Jan 09 '18 at 15:56
@WillemVanOnsem why did you delete your answer? It was perfectly valid. — ctwheels, Jan 09 '18 at 15:59
@WillemVanOnsem that may be what the OP is searching for. I understand they wrote `a-zA-Z`, but even if it's not what the OP is searching for maybe it will help another user. Just write a note letting users know that it matches Unicode characters and the `Lm`, `Lt`, `Lu`, `Ll`, `Lo` categories. Or just add logic from [this](https://stackoverflow.com/questions/196345/how-to-check-if-a-string-in-python-is-in-ascii) post to your answer. — ctwheels, Jan 09 '18 at 16:04
@WillemVanOnsem yeah str.isalpha solved my problem perfectly. I don't think i will ever encounter any diacritics in this problem since the string is part of url. I would accept your answer anyway. but thanks man. — Vito G., Jan 09 '18 at 16:15

Willem Van Onsem · Accepted Answer · 2018-01-09T16:45:14.617

There is no need to write a regex here. This is already builtin in Python with str.isalpha:

Return True if all characters in the string are alphabetic and there is at least one character, False otherwise.

So we can check it with:

if your_string.isalpha():
    pass

Note however that:

Note: str.isalpha also includes diacritics, etc. For example:
>>> 'ä'.isalpha()
True
this is not per se a problem. But it can be something to take into account.

In case you do not want diacricics, you can for instance check that all characters have an ord(..) less than 128 as well:

if your_string.isalpha() and all(ord(c) < 128 for c in your_string):
    pass

The advantage of using builtins is that these are more self-explaining (isalpha() clearly suggests what it is doing), and furthermore it is very unlikely to contain any bugs (I am not saying that other approaches do contain bugs, but writing something yourself, typically means it is not tested very effectively, hence it can still not fully cover edge and corner cases).

Nitpick: This will also accept accented letters, umlauts, and probably a bunch more, but maybe that's exactly what OP actually wants. (I did not mean that that's wrong, just wanted to point this out.) — tobias_k, Jan 09 '18 at 15:58
You can remove diacritics from the string using the methods discussed on [this thread](https://stackoverflow.com/questions/196345/how-to-check-if-a-string-in-python-is-in-ascii) if necessary. — ctwheels, Jan 09 '18 at 16:27

Ajax1234 · Answer 2 · 2018-01-09T16:12:28.163

2

You can use the anchors ^ and $:

import re
s = "hello"
if re.findall('^[a-zA-Z]+$', s):
   pass #string condition met

Performance comparisons between re.findall and re.search:

import timeit
s1 = """
import re
re.findall('^[a-zA-Z]+$', 'hello')
"""
print(timeit.timeit(stmt=s1,number=10000))
>>> 0.0147941112518

s2 = """
import re
re.match('^[a-zA-Z]+$', 'hello')
"""
print(timeit.timeit(stmt=s2,number=10000))
>>> 0.0134868621826

While re.match performs slightly better than re.findall, I prefer re.findall as 1) it is easier to view the results initially and 2) immediately store the results in a list.

edited Jan 09 '18 at 16:12

answered Jan 09 '18 at 15:54

Ajax1234

69,937
8
61
102

But using `match` you don't need `^` and `$`, and a list is not necessary (if only because since you *must* test an entire string, it can only occur zero or one time). – Jongware Jan 09 '18 at 16:27

How to check if a string is word with regex?

2 Answers2