523

How do I check if a string matches the following pattern?

Uppercase letter, number(s), uppercase letter, number(s)...

Example:

  • These would match:
    A1B2
    B10L1
    C1N200J1
    
  • These wouldn't ('^' points to problem)
    a1B2
    ^
    A10B
       ^
    AB400
    ^
    
cottontail
  • 10,268
  • 18
  • 50
  • 51
DanielTA
  • 6,038
  • 3
  • 23
  • 27

10 Answers10

682
import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)
CrazyCasta
  • 26,917
  • 4
  • 45
  • 72
  • 69
    From the docs on `re.match`: `If zero or more characters at the beginning of string match the regular expression pattern`. I just spent like 30 minutes trying to understand why I couldn't match something at the end of a string. Seems like it's not possible with `match`, is it? For that, `re.search(pattern, my_string)` works though. – conradkleinespel Nov 11 '16 at 15:52
  • 4
    @conradk Yes, you're right, I think there's something like an implied `^` at the beginning when you use `match`. I think it's a bit more complicated then that very simple explanation, but I'm not clear. You are correct that it does start from the beginning of the string though. – CrazyCasta Nov 11 '16 at 20:10
  • I edited your answer, because it only makes sense with `search()` in this context. – Robo Robok Feb 21 '21 at 12:16
  • Yes, but that's what the questioner wants. I'm not sure what you mean by "only makes sense with `search()`". It works perfectly fine with match. – CrazyCasta Sep 05 '21 at 21:03
  • To be clear: You probably want to check if pattern.match returns something; luckily "None" is truthy, so you can just do "if pattern.match:" –  Jun 22 '22 at 19:59
365

One-liner: re.match(r"pattern", string) # No need to compile

import re
>>> if re.match(r"hello[0-9]+", 'hello1'):
...     print('Yes')
... 
Yes

You can evalute it as bool if needed

>>> bool(re.match(r"hello[0-9]+", 'hello1'))
True
nehem
  • 12,775
  • 6
  • 58
  • 84
  • 1
    That's weird. Why can you use `re.match` in the context of an `if`, but you have to use `bool` if you're using it elsewhere? – LondonRob Mar 13 '18 at 13:59
  • 62
    Careful with `re.match`. It only matches at the start of a string. Have a look at `re.search` instead. – LondonRob Mar 13 '18 at 14:02
  • 3
    @LondonRob probably because `if` checks for the match not being `None`. – Dennis Mar 18 '19 at 13:17
  • There's a big need to compile to make sure there are no errors in the regular expressions like bad character range errors – Suh Fangmbeng May 06 '20 at 17:10
  • 2
    @SuhFangmbeng Compilation is useful when the same `re` is used in more than one places to improve efficiency. In terms of error `.match` would throw the same error what `.compile` does. It's perfectly safe to use. – nehem May 06 '20 at 21:58
  • 2
    @nehem actually all of the regex functions in `re` module compile and cache the patterns. Therefore there is absolutely no efficiency gain using compile and then match than just directly calling `re.match`. All of these functions call the internal function `_compile` (including `re.compile`) which does the caching to a python dictionary. – miksus Jun 29 '21 at 21:12
  • If you're in python 3.8+ you can use the walrus operator if you need to access the match object: ```if (match := re.match(r"hello([0-9]+)", string)): print(match[1])``` – Peter Aug 11 '22 at 12:25
56

Please try the following:

import re

name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"]

# Match names.
for element in name:
     m = re.match("(^[A-Z]\d[A-Z]\d)", element)
     if m:
        print(m.groups())
Edd
  • 3,724
  • 3
  • 26
  • 33
sumeet agrawal
  • 561
  • 4
  • 2
31
import re
import sys

prog = re.compile('([A-Z]\d+)+')

while True:
  line = sys.stdin.readline()
  if not line: break

  if prog.match(line):
    print 'matched'
  else:
    print 'not matched'
Marc Cohen
  • 3,742
  • 2
  • 19
  • 19
19

As stated in the comments, all these answers using re.match implicitly matches on the start of the string. re.search is needed if you want to generalize to the whole string.

import re

pattern = re.compile("([A-Z][0-9]+)+")

# finds match anywhere in string
bool(re.search(pattern, 'aA1A1'))  # True

# matches on start of string, even though pattern does not have ^ constraint
bool(re.match(pattern, 'aA1A1'))  # False

If you need the full string to exactly match the regex, see @Ali Sajjad's answer using re.fullmatch

Credit: @LondonRob and @conradkleinespel in the comments.

crypdick
  • 16,152
  • 7
  • 51
  • 74
16

Careful! (Maybe you want to check if FULL string matches)

The re.match(...) will not work if you want to match the full string.

For example;

  • re.match("[a-z]+", "abcdef") ✅ will give a match
  • But! re.match("[a-z]+", "abcdef 12345") ✅ will also give a match because there is a part in string which matches (maybe you don't want that when you're checking if the entire string is valid or not)

Solution

Use re.fullmatch(...). This will only match if the

if re.fullmatch("[a-z]+", my_string):
    print("Yes")
Example
  • re.fullmatch("[a-z]+", "abcdef") ✅ Yes
  • re.fullmatch("[a-z]+", "abcdef 12345") ❌ No

One liner: bool(re.fullmatch("[a-z]+", my_string))

Ali Sajjad
  • 3,589
  • 1
  • 28
  • 38
12

regular expressions make this easy ...

[A-Z] will match exactly one character between A and Z

\d+ will match one or more digits

() group things (and also return things... but for now just think of them grouping)

+ selects 1 or more

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
12
  
import re

ab = re.compile("^([A-Z]{1}[0-9]{1})+$")
ab.match(string)
  


I believe that should work for an uppercase, number pattern.

Kneel-Before-ZOD
  • 4,141
  • 1
  • 24
  • 26
0

Ali Sajjad's answer should be the default, i.e. fullmatch to avoid false positives.

However, it's also important to know that you're always checking not None for "yes, it's a match":

The two possibilities are therefore:

if re.fullmatch("[a-z]+", my_string)!=None:

or, as in Ali's answer:

if bool(re.fullmatch("[a-z]+", my_string)):

To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:

def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired
    return re.fullmatch(pattern, string, flags)!=None

Those 2 flags are (usually) the most helpful default flags settings in my experience, rather than "0".

In practice, of course, you may need to examine the Match object delivered by re.fullmatch. But for cases where you just need to find whether there's a match...

mike rodent
  • 14,126
  • 11
  • 103
  • 157
0

Just want to point out that for string without line breaks (\n), one could also use the anchors \A and \Z for beginning and end of a string, respectively.

import re
pat = re.compile(r'\A([A-Z][0-9]+)+\Z')
pat.match('A1B2')   # match
pat.match('A1B2a')  # no match

This makes a difference if the string contains multiple lines and you want to match the pattern in latter lines in a string.

match vs search vs fullmatch. Which is appropriate?

re.search is the more general of the three. As others have said, re.match() checks for a match only at the beginning of the string. re.search() can mimic that too by prepending \A to whatever pattern used. On the other hand, re.fullmatch() checks if the entire string is a match, which can again be mimicked by re.search() by prepending \A and appending \Z to whatever pattern used. Below example may help illustrate this point.

# prepending \A to pattern makes `search` behave similar to `match`
s1 = '1B10L1'
pattern1 = r'([A-Z][0-9]+)+'

re.match(pattern1, s1)             # no match
re.search(pattern1, s1)            # match
re.search(fr"\A{pattern1}", s1)    # no match     <--- with \A prepended (behaves same as re.match)


# `match` only checks at the beginning and doesn't care about the end of the string
s2 = 'B10L1a'

re.match(pattern1, s2)             # match
re.match(fr"{pattern1}\Z", s2)     # no match     <--- with \Z appended (behaves same as re.fullmatch)
re.search(fr"\A{pattern1}\Z", s2)  # no match     <--- have the pattern between \A and \Z to mimic fullmatch
re.fullmatch(pattern1, s2)         # no match

If the string contains multiple lines and if it's flagged, then this relation breaks down: fullmatch never scans across lines and match scans only the first line.

# by flagging multiline, `match` and `search` match the pattern in the first line
s3 = 'B10\nL1'
pattern2 = r'^([A-Z][0-9]+)+$'
re.match(pattern2, s3, re.M)       # match
re.search(pattern2, s3, re.M)      # match
re.fullmatch(pattern2, s3, re.M)   # no match

# the pattern is in the second line but this will not be matched by `match`
s4 = 'a\nB10'
pattern2 = r'^([A-Z][0-9]+)+$'

re.match(pattern2, s4, re.M)              # no match
re.search(pattern2, s4, re.M)             # match
re.search(r'\A([A-Z][0-9]+)+', s4, re.M)  # no match  <--- with `\A` instead of `^` it mimics `match`

To compile or not to compile

If you need to search a pattern in a single string, then there's no need to compile it since re.search, re.match etc. all make calls to _compile method anyway. However, if you need to search a pattern in multiple strings, then compiling it first makes a lot of difference performance-wise. For the example in the OP, pre-compiling the pattern and searching through the list is over 2 times faster than not compiling first.

from timeit import timeit
setup2 = "import re; lst = ['A1B2', 'B10L1', 'C1N200J1K1', 'a1B2', 'A10B', 'AB400']"
setup1 = setup2 + "; pat = re.compile(r'^([A-Z][0-9]+)+$')"

for _ in range(3):
    t1 = timeit("list(map(pat.match, lst))", setup1)         # pre-compiled
    t2 = timeit("[re.match(r'^([A-Z][0-9]+)+$', x) for x in lst]", setup2)
    print(t2 / t1)
    
# 2.083788080189313
# 2.448126223007598
# 2.43617482049811
cottontail
  • 10,268
  • 18
  • 50
  • 51