8

I need to check a string is a valid SHA1 string, like

 '418c1f073782a1c855890971ff18794f7a298f6d'

I didn't know the rules for that, for example whether a number and a letter is a must? or how many number or letters are minimum?

Could anybody advise any regex for matching in python?

Mazdak
  • 105,000
  • 18
  • 159
  • 188
skydoor
  • 25,218
  • 52
  • 147
  • 201

2 Answers2

21

I believe it's faster to avoid using regex. A SHA1 is a random 40-digit hexidecimal number, so if you can't convert it to a hex and it's not 40 characters in length, it's not a SHA1:

def is_sha1(maybe_sha):
    if len(maybe_sha) != 40:
        return False
    try:
        sha_int = int(maybe_sha, 16)
    except ValueError:
        return False
    return True
mVChr
  • 49,587
  • 11
  • 107
  • 104
  • 1
    I like this. very clever! – lollercoaster Aug 26 '15 at 18:47
  • 1
    Isn't using try-catch blocks for control flow supposed to be an anti-pattern? Honest question, not criticizing, just trying to learn. – slashCoder Nov 02 '18 at 21:18
  • 1
    @slashCoder I don't know, I'm a greybeard, but it's more performant. – mVChr Nov 06 '18 at 18:40
  • 1
    In python specifically there is a style principle of "easier to ask for forgiveness than permission", and some actually label NOT using try-except as the anti-pattern https://docs.quantifiedcode.com/python-anti-patterns/readability/asking_for_permission_instead_of_forgiveness_when_working_with_files.html, but take it with a grain of salt. Other communities are not as warm to the idea. – AlanSE Jul 02 '19 at 23:58
  • 1
    Pretty cool! This is about 5 times more performant than regex (even if the pattern is compiled in advance), as measured by `timeit` on a Mac using python 3.7.7. – haridsv Nov 25 '20 at 11:36
7

Use this regex:

\b[0-9a-f]{40}\b

Because it is a hexadecimal string with exactly 40 characters. You could also cast it as an integer as suggested below in another answer, however, this is the regex solution.

An example:

import re
pattern = re.compile(r'\b[0-9a-f]{40}\b')
match = re.match(pattern, '418c1f073782a1c855890971ff18794f7a298f6d')
print match.group(0)  # 418c1f073782a1c855890971ff18794f7a298f6d
lollercoaster
  • 15,969
  • 35
  • 115
  • 173