0

Trying to find pattern matches based on the following conditions:

  • Length of string is 5 characters
  • Char [0] = Letter/Number
  • Char [1] = Letter
  • Char [2-4] = Number

I don't understand why "22222" works for this expression?

 p = r'(\w|\d)(\w)(\d){3,}'
 m = re.match(p, "AA012")    # Works as expected
 --> 'AA012'

 m = re.match(p, "1A222")    # Works as expected
 --> '1A222'

 m = re.match(p, "22222")    # Does NOT work as expected!
 --> '22222'

What am I missing in my regex expression syntax?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Biff
  • 1,009
  • 3
  • 11
  • 20

1 Answers1

2

\w matches letters and digits (as well as underscores).

Use [a-zA-Z] if you want to match only letters:

r'\w[a-zA-Z]\d{3,}'

which matches a letter or digit (or an underscore), then a letter, then 3 digits.

Demo:

>>> import re
>>> p = r'\w[a-zA-Z]\d{3,}'
>>> re.match(p, "22222")
>>> re.match(p, "AA012")
<_sre.SRE_Match object at 0x105aca718>
>>> re.match(p, "1A222")
<_sre.SRE_Match object at 0x105aca780>
>>> re.match(p, "_A222")
<_sre.SRE_Match object at 0x105aca718>

If the underscore is a problem, use:

r'[a-zA-Z\d][a-zA-Z]\d{3}'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343