16

I'm working with strings that contain both digits and alphanumerics, or just digits, but not just alphas. In order to test for false matches, I need to check if the strings contain at least one digit, printing an error message if it doesn't. I have been using the following code:

s = '0798237 sh 523-123-asdjlh'

def contains_digits(s):
    for char in list(s):
        if char.isdigit():
            return True
            break
    return False

if contains_digits(s) == True:
    print s
else:
    print 'Error'

Is there a more pythonic or simpler way to do so, or does this suffice? Also, I can't just check to see if the string is alphanumeric, because the string may contain various symbols ('-', spaces, etc.)

aensm
  • 3,325
  • 9
  • 34
  • 44
  • 4
    By the way, `contains_digits == True` is redundant. You can drop the `== True` part and it'll operate the same way. – SomeKittens Jun 27 '12 at 18:15

4 Answers4

40

This is one of those places where a regular expression is just the thing:

_digits = re.compile('\d')
def contains_digits(d):
    return bool(_digits.search(d))

Little demo:

>>> _digits = re.compile('\d')
>>> def contains_digits(d):
...     return bool(_digits.search(d))
... 
>>> contains_digits('0798237 sh 523-123-asdjlh')
True
>>> contains_digits('sh asdjlh')
False

You could use the any method with .isdigit() as described in @Wallacolloo's answer, but that's slower than the simple regular expression:

>>> import timeit
>>> timeit.timeit("contains_digits('0798237 sh 523-123-asdjlh')", 'from __main__ import contains_digits')
0.77181887626647949
>>> timeit.timeit("contains_digits_any('0798237 sh 523-123-asdjlh')", 'from __main__ import contains_digits_any')
1.7796030044555664

The if method is on par with the regular expression:

>>> timeit.timeit("contains_digits_if('0798237 sh 523-123-asdjlh')", 'from __main__ import contains_digits_if')
0.87261390686035156

But things get worse if the digits appear late in the text:

>>> timeit.timeit("contains_digits('asdjlhtaheoahueoaea 11 thou')", 'from __main__ import contains_digits')
1.202538013458252
>>> timeit.timeit("contains_digits_any('asdjlhtaheoahueoaea 11 thou')", 'from __main__ import contains_digits_any')
5.0348429679870605
>>> timeit.timeit("contains_digits_if('asdjlhtaheoahueoaea 11 thou')", 'from __main__ import contains_digits_if')
3.707183837890625

Timings tested on python 2.6 on Mac OS X 10.7.

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Why not simply `return bool(_digits.search(d))`? – DSM Jun 27 '12 at 18:21
  • 1
    @DSM: Because I didn't think of that quickly enough? :-) – Martijn Pieters Jun 27 '12 at 18:22
  • The timeit results on mine are roughly the same... (about .6 for both) – Jon Clements Jun 27 '12 at 18:38
  • I get 0.84s vs 0.96s (2.7.3). But even the RE were several times faster, I'd only use it if this were a bottleneck. I much prefer writing in Python. – DSM Jun 27 '12 at 18:39
  • @JonClements: On what version of python? My timings were on 2.6, on Mac OS 10.7. On 2.7 the results are little closer (0.95 vs 1.24), but the regular expression is still a little more efficient. Timings vary far more when the digits are *late* in the string. – Martijn Pieters Jun 27 '12 at 18:44
  • Python 2.7.3 OS: Linux 3.2.0-23 (Ubuntu 12.04) - still - it's concerning that 2.6 is nearly half the speed using `any`! – Jon Clements Jun 27 '12 at 18:48
  • I s'pose `re` is more optimised and `any` has to be generic so can't make assumptions.... Interesting to know though! – Jon Clements Jun 27 '12 at 18:49
  • @JonClements: just tested again with 2.7.3 on Mac and the 'late digit' test shows re to be 3.5 times as fast still. – Martijn Pieters Jun 27 '12 at 18:51
  • 1
    @JonClements: my guess is that the `isdigit()` test is inefficient; the more characters it is called on before a digit is found, the worse it gets. – Martijn Pieters Jun 27 '12 at 18:56
  • @Martijn Pieters: You're right. `isdigit(d)` is significantly slower than just testing `d in "0123456789"`. It might be because isdigit works on things such as "123", testing that *all* characters are digits. – Ponkadoodle Jun 27 '12 at 19:38
  • hi i am trying to find the same as above, need to check whether a string contains a number/digit in it, by using above method its displaying "global name '_digits' is not defined" – Shiva Krishna Bavandla Aug 07 '13 at 07:03
  • Seems you forgot a line then. – Martijn Pieters Aug 07 '13 at 07:32
16

Use the any function, passing in a sequence.
If any element of the sequence is true (ie is a digit, in this case), then any returns True, else False. https://docs.python.org/library/functions.html#any

def contains_digits(s):
    return any(char.isdigit() for char in s)

If you're concerned about performance though, your current method is actually faster.

twasbrillig
  • 17,084
  • 9
  • 43
  • 67
Ponkadoodle
  • 5,777
  • 5
  • 38
  • 62
  • Unfortunately, in this particular case this runs at half the speed of a regular expression. – Martijn Pieters Jun 27 '12 at 18:27
  • Thanks, didn't know about the 'any' function. So it appears that python strings are iterable, and 'for char in s' will go through each char in the string? – aensm Jun 27 '12 at 18:28
  • 3
    @aensm: Indeed, python strings are [sequences too](http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange). – Martijn Pieters Jun 27 '12 at 18:31
  • @Martijn Pieters: Thanks for showing the benchmark. I would never have expected a regular expression to outperform even unoptimized code like this. I was not thinking in terms of performance - if I was, I would have avoided `any` and written something like what I'm about to edit into my post, which outperforms the regular expression solution (in 32 bit Python 2.6 on Win7) by about 1.062 vs 1.254. – Ponkadoodle Jun 27 '12 at 19:02
  • @Wallacooloo: I'd have expected `any` to be just as fast as that, actually. – Martijn Pieters Jun 27 '12 at 19:03
  • @Wallacoloo: Updated the timings. The `if` + `.isdigit` combo is only competitive if the first digit is found early in the string. – Martijn Pieters Jun 27 '12 at 19:10
  • @Martijn Pieters: Makes sense. I had no clue regular expressions where that fast! Although it makes sense because that solution spends all its time executing C code rather than a mix of interpreter and library code. Benchmarking is always fun :) – Ponkadoodle Jun 27 '12 at 19:17
  • @Wallacoloo: RE's are fast *when used correctly*, but often misunderstood and used for non-regular situations. – Martijn Pieters Jun 27 '12 at 19:19
4

After reading the discussion above, I was curious about the performance of a set-based version like this:

def contains_digit(s, digits=set('0123456789')):
    return bool(digits.intersection(s))

In my testing, this was slightly faster on average than the re version on one computer and slightly slower on another (?). Just for fun, I compared some other versions as well.

import math
import re
import timeit


def contains_digit_set_intersection(s, digits=set('0123456789')):
    return bool(digits.intersection(s))


def contains_digit_iter_set(s, digits=set('0123456789')):
    for c in s:
        if c in digits:
            return True
    return False


def contains_digit_iter_str(s, digits='0123456789'):
    for c in s:
        if c in digits:
            return True
    return False


def contains_digit_re(s, digits=re.compile(r'\d')):
    return bool(digits.search(s))


def print_times(func, times):
    name = func.__name__
    average = sum(times) / len(times)
    formatted_times = ' '.join('{:.3f}'.format(t) for t in times)
    message = '{name:<31} {times} ~{average:.3f}'
    print(message.format(name=name, times=formatted_times, average=average))


funcs = [
    contains_digit_set_intersection,
    contains_digit_iter_set,
    contains_digit_iter_str,
    contains_digit_re,
]


cases = [
    '1bcdefg7',
    'abcdefg7',
    'abcdefgh',
    '0798237 sh 523-123-asdjlh',
    'asdjlhtaheoahueoaea 11 thou',
]


for func in funcs:
    times = []
    for case in cases:
        func_case = '{func.__name__}("{case}")'.format(func=func, case=case)
        time = timeit.timeit(func_case, globals={func.__name__: func})
        times.append(time)
    print_times(func, times)

Sample runs for the two computers (time for each case and the ~average):

contains_digit_set_intersection 0.744 0.731 0.724 1.227 1.113 ~0.908
contains_digit_iter_set         0.264 0.541 0.566 0.260 1.068 ~0.540
contains_digit_iter_str         0.272 0.649 0.632 0.274 1.211 ~0.607
contains_digit_re               0.748 0.854 0.679 0.744 1.006 ~0.806

contains_digit_set_intersection 0.860 0.870 0.855 1.456 1.357 ~1.080
contains_digit_iter_set         0.285 0.613 0.617 0.307 1.163 ~0.597
contains_digit_iter_str         0.295 0.748 0.799 0.288 1.595 ~0.745
contains_digit_re               1.157 1.236 0.927 1.086 1.450 ~1.171
  • For me it looks like `contains_digit_iter_set` should actually look like this: `for d in digits: if d in s: return True`. Or at least that should be taken into consideration. – Hacker Jun 13 '21 at 22:27
  • Also it might be wiser to define the `digits` outside of the function or use `string.digits` or use `s.isdigit()` instead of `string in digits` – Hacker Jun 13 '21 at 22:35
2

For those seeking shorter solution: any(d in s for d in'0123456789')

Nickmaovich
  • 517
  • 6
  • 10