Lookarounds to the rescue!
^(?!\d+$)\w+$
This uses a negative lookahead construct and anchors, see a demo on regex101.com
Note that you could have the same result with pure Python
code alone:
samples = ["expression123", "123expression", "exp123ression", "1235234567544"]
filtered = [item for item in samples if not item.isdigit()]
print(filtered)
# ['expression123', '123expression', 'exp123ression']
See another demo on ideone.com.
With both approaches you wouldn't account for input strings like -1
or 1.0
(they'd be allowed).
Tests
As the discussion somewhat arose, here's a small test suite for different sample sizes and expressions:
import string, random, re, timeit
class RegexTester():
samples = []
expressions_to_test = {"Cary": "^(?=.*\D)\w+$",
"Jan": "^(?!\d+$)\w+$"}
def __init__(self, sample_size=100, word_size=10, times=100):
self.sample_size = sample_size
self.word_size = word_size
self.times = times
# generate samples
self.samples = ["".join(random.choices(string.ascii_letters + string.digits, k=self.word_size))
for _ in range(self.sample_size)]
# compile the expressions in question
for key, expression in self.expressions_to_test.items():
self.expressions_to_test[key] = {"raw": expression, "compiled": re.compile(expression)}
def describe_sample(self):
only_digits = [item for item in self.samples if all(char.isdigit() for char in item)]
return only_digits
def test_expressions(self):
def regex_test(samples, expr):
return [expr.search(item) for item in samples]
for key, values in self.expressions_to_test.items():
t = timeit.Timer(lambda: regex_test(self.samples, values["compiled"]))
print("{key}, Times: {times}, Result: {result}".format(key=key,
times=self.times,
result=t.timeit(100)))
rt = RegexTester(sample_size=10 ** 5, word_size=10, times=10 ** 4)
#rt.describe_sample()
rt.test_expressions()
Which for a sample size of 10^5, a word size of 10 gave the comparable results for the both expressions:
Cary, Times: 10000, Result: 6.1406331
Jan, Times: 10000, Result: 5.948537699999999
When you set the sample size to 10^4 and the word size to 10^3, the result is the same:
Cary, Times: 10000, Result: 10.1723557
Jan, Times: 10000, Result: 9.697761900000001
You'll get significant differences when the strings consist only of numbers (aka the samples are generated only with numbers):
Cary, Times: 10000, Result: 25.4842013
Jan, Times: 10000, Result: 17.3708319
Note that this is randomly generated text and due to the method of generating it, the longer the strings are, the less likely they are to consist only of numbers. In the end it will depend on the actual text inputs.