Is there a better way to find if string contains digits?

Question

I'm working with strings that contain both digits and alphanumerics, or just digits, but not just alphas. In order to test for false matches, I need to check if the strings contain at least one digit, printing an error message if it doesn't. I have been using the following code:

s = '0798237 sh 523-123-asdjlh'

def contains_digits(s):
    for char in list(s):
        if char.isdigit():
            return True
            break
    return False

if contains_digits(s) == True:
    print s
else:
    print 'Error'

Is there a more pythonic or simpler way to do so, or does this suffice? Also, I can't just check to see if the string is alphanumeric, because the string may contain various symbols ('-', spaces, etc.)

By the way, `contains_digits == True` is redundant. You can drop the `== True` part and it'll operate the same way. — SomeKittens, Jun 27 '12 at 18:15

score 40 · Accepted Answer · edited May 23 '17 at 11:53

40

This is one of those places where a regular expression is just the thing:

_digits = re.compile('\d')
def contains_digits(d):
    return bool(_digits.search(d))

Little demo:

>>> _digits = re.compile('\d')
>>> def contains_digits(d):
...     return bool(_digits.search(d))
... 
>>> contains_digits('0798237 sh 523-123-asdjlh')
True
>>> contains_digits('sh asdjlh')
False

You could use the any method with .isdigit() as described in @Wallacolloo's answer, but that's slower than the simple regular expression:

>>> import timeit
>>> timeit.timeit("contains_digits('0798237 sh 523-123-asdjlh')", 'from __main__ import contains_digits')
0.77181887626647949
>>> timeit.timeit("contains_digits_any('0798237 sh 523-123-asdjlh')", 'from __main__ import contains_digits_any')
1.7796030044555664

The if method is on par with the regular expression:

>>> timeit.timeit("contains_digits_if('0798237 sh 523-123-asdjlh')", 'from __main__ import contains_digits_if')
0.87261390686035156

But things get worse if the digits appear late in the text:

>>> timeit.timeit("contains_digits('asdjlhtaheoahueoaea 11 thou')", 'from __main__ import contains_digits')
1.202538013458252
>>> timeit.timeit("contains_digits_any('asdjlhtaheoahueoaea 11 thou')", 'from __main__ import contains_digits_any')
5.0348429679870605
>>> timeit.timeit("contains_digits_if('asdjlhtaheoahueoaea 11 thou')", 'from __main__ import contains_digits_if')
3.707183837890625

Timings tested on python 2.6 on Mac OS X 10.7.

edited May 23 '17 at 11:53

Community

1
1

answered Jun 27 '12 at 18:14

Martijn Pieters

1,048,767
296
4,058
3,343

1

Why not simply `return bool(_digits.search(d))`? – DSM Jun 27 '12 at 18:21
1

@DSM: Because I didn't think of that quickly enough? :-) – Martijn Pieters Jun 27 '12 at 18:22
The timeit results on mine are roughly the same... (about .6 for both) – Jon Clements Jun 27 '12 at 18:38
I get 0.84s vs 0.96s (2.7.3). But even the RE were several times faster, I'd only use it if this were a bottleneck. I much prefer writing in Python. – DSM Jun 27 '12 at 18:39
@JonClements: On what version of python? My timings were on 2.6, on Mac OS 10.7. On 2.7 the results are little closer (0.95 vs 1.24), but the regular expression is still a little more efficient. Timings vary far more when the digits are *late* in the string. – Martijn Pieters Jun 27 '12 at 18:44
Python 2.7.3 OS: Linux 3.2.0-23 (Ubuntu 12.04) - still - it's concerning that 2.6 is nearly half the speed using `any`! – Jon Clements Jun 27 '12 at 18:48
I s'pose `re` is more optimised and `any` has to be generic so can't make assumptions.... Interesting to know though! – Jon Clements Jun 27 '12 at 18:49
@JonClements: just tested again with 2.7.3 on Mac and the 'late digit' test shows re to be 3.5 times as fast still. – Martijn Pieters Jun 27 '12 at 18:51
1

@JonClements: my guess is that the `isdigit()` test is inefficient; the more characters it is called on before a digit is found, the worse it gets. – Martijn Pieters Jun 27 '12 at 18:56
@Martijn Pieters: You're right. `isdigit(d)` is significantly slower than just testing `d in "0123456789"`. It might be because isdigit works on things such as "123", testing that *all* characters are digits. – Ponkadoodle Jun 27 '12 at 19:38
hi i am trying to find the same as above, need to check whether a string contains a number/digit in it, by using above method its displaying "global name '_digits' is not defined" – Shiva Krishna Bavandla Aug 07 '13 at 07:03
Seems you forgot a line then. – Martijn Pieters Aug 07 '13 at 07:32

score 16 · Answer 2 · edited Nov 14 '14 at 00:44

16

Use the any function, passing in a sequence.
If any element of the sequence is true (ie is a digit, in this case), then any returns True, else False. https://docs.python.org/library/functions.html#any

def contains_digits(s):
    return any(char.isdigit() for char in s)

If you're concerned about performance though, your current method is actually faster.

edited Nov 14 '14 at 00:44

twasbrillig

17,084
9
43
67

answered Jun 27 '12 at 18:15

Ponkadoodle

5,777
5
38
62

Unfortunately, in this particular case this runs at half the speed of a regular expression. – Martijn Pieters Jun 27 '12 at 18:27
Thanks, didn't know about the 'any' function. So it appears that python strings are iterable, and 'for char in s' will go through each char in the string? – aensm Jun 27 '12 at 18:28
3

@aensm: Indeed, python strings are [sequences too](http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange). – Martijn Pieters Jun 27 '12 at 18:31
@Martijn Pieters: Thanks for showing the benchmark. I would never have expected a regular expression to outperform even unoptimized code like this. I was not thinking in terms of performance - if I was, I would have avoided `any` and written something like what I'm about to edit into my post, which outperforms the regular expression solution (in 32 bit Python 2.6 on Win7) by about 1.062 vs 1.254. – Ponkadoodle Jun 27 '12 at 19:02
@Wallacooloo: I'd have expected `any` to be just as fast as that, actually. – Martijn Pieters Jun 27 '12 at 19:03
@Wallacoloo: Updated the timings. The `if` + `.isdigit` combo is only competitive if the first digit is found early in the string. – Martijn Pieters Jun 27 '12 at 19:10
@Martijn Pieters: Makes sense. I had no clue regular expressions where that fast! Although it makes sense because that solution spends all its time executing C code rather than a mix of interpreter and library code. Benchmarking is always fun :) – Ponkadoodle Jun 27 '12 at 19:17
@Wallacoloo: RE's are fast *when used correctly*, but often misunderstood and used for non-regular situations. – Martijn Pieters Jun 27 '12 at 19:19

score 4 · Answer 3 · 2017-10-27T02:28:00.073

After reading the discussion above, I was curious about the performance of a set-based version like this:

def contains_digit(s, digits=set('0123456789')):
    return bool(digits.intersection(s))

In my testing, this was slightly faster on average than the re version on one computer and slightly slower on another (?). Just for fun, I compared some other versions as well.

import math
import re
import timeit


def contains_digit_set_intersection(s, digits=set('0123456789')):
    return bool(digits.intersection(s))


def contains_digit_iter_set(s, digits=set('0123456789')):
    for c in s:
        if c in digits:
            return True
    return False


def contains_digit_iter_str(s, digits='0123456789'):
    for c in s:
        if c in digits:
            return True
    return False


def contains_digit_re(s, digits=re.compile(r'\d')):
    return bool(digits.search(s))


def print_times(func, times):
    name = func.__name__
    average = sum(times) / len(times)
    formatted_times = ' '.join('{:.3f}'.format(t) for t in times)
    message = '{name:<31} {times} ~{average:.3f}'
    print(message.format(name=name, times=formatted_times, average=average))


funcs = [
    contains_digit_set_intersection,
    contains_digit_iter_set,
    contains_digit_iter_str,
    contains_digit_re,
]


cases = [
    '1bcdefg7',
    'abcdefg7',
    'abcdefgh',
    '0798237 sh 523-123-asdjlh',
    'asdjlhtaheoahueoaea 11 thou',
]


for func in funcs:
    times = []
    for case in cases:
        func_case = '{func.__name__}("{case}")'.format(func=func, case=case)
        time = timeit.timeit(func_case, globals={func.__name__: func})
        times.append(time)
    print_times(func, times)

Sample runs for the two computers (time for each case and the ~average):

contains_digit_set_intersection 0.744 0.731 0.724 1.227 1.113 ~0.908
contains_digit_iter_set         0.264 0.541 0.566 0.260 1.068 ~0.540
contains_digit_iter_str         0.272 0.649 0.632 0.274 1.211 ~0.607
contains_digit_re               0.748 0.854 0.679 0.744 1.006 ~0.806

contains_digit_set_intersection 0.860 0.870 0.855 1.456 1.357 ~1.080
contains_digit_iter_set         0.285 0.613 0.617 0.307 1.163 ~0.597
contains_digit_iter_str         0.295 0.748 0.799 0.288 1.595 ~0.745
contains_digit_re               1.157 1.236 0.927 1.086 1.450 ~1.171

For me it looks like `contains_digit_iter_set` should actually look like this: `for d in digits: if d in s: return True`. Or at least that should be taken into consideration. — Hacker, Jun 13 '21 at 22:27
Also it might be wiser to define the `digits` outside of the function or use `string.digits` or use `s.isdigit()` instead of `string in digits` — Hacker, Jun 13 '21 at 22:35

score 2 · Answer 4 · answered Dec 05 '15 at 19:14

2

For those seeking shorter solution: any(d in s for d in'0123456789')

answered Dec 05 '15 at 19:14

Nickmaovich

517
6
10

Is there a better way to find if string contains digits?

4 Answers4

Linked

Related