Check if string matches pattern

Question

How do I check if a string matches the following pattern?

Uppercase letter, number(s), uppercase letter, number(s)...

Example:

These would match:
```
A1B2
B10L1
C1N200J1
```
These wouldn't ('^' points to problem)
```
a1B2
^
A10B
   ^
AB400
^
```

In your third example, the problem should be with `B` and not with `A`. — Burhan Khalid, Sep 26 '12 at 05:33
maybe it's a typo error on the problem. both `A` and `B` are small letters right? `A10b` and `aB400`? — John Woo, Sep 26 '12 at 05:34
@Burhan, The problem is with A because B has numbers next to it and A doesn't — DanielTA, Sep 26 '12 at 05:39

CrazyCasta · Accepted Answer · 2021-09-05T21:02:28.503

682

import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)

edited Sep 05 '21 at 21:02

answered Sep 26 '12 at 05:30

CrazyCasta

26,917
4
45
72

69

From the docs on `re.match`: `If zero or more characters at the beginning of string match the regular expression pattern`. I just spent like 30 minutes trying to understand why I couldn't match something at the end of a string. Seems like it's not possible with `match`, is it? For that, `re.search(pattern, my_string)` works though. – conradkleinespel Nov 11 '16 at 15:52
4

@conradk Yes, you're right, I think there's something like an implied `^` at the beginning when you use `match`. I think it's a bit more complicated then that very simple explanation, but I'm not clear. You are correct that it does start from the beginning of the string though. – CrazyCasta Nov 11 '16 at 20:10
I edited your answer, because it only makes sense with `search()` in this context. – Robo Robok Feb 21 '21 at 12:16
Yes, but that's what the questioner wants. I'm not sure what you mean by "only makes sense with `search()`". It works perfectly fine with match. – CrazyCasta Sep 05 '21 at 21:03
To be clear: You probably want to check if pattern.match returns something; luckily "None" is truthy, so you can just do "if pattern.match:" – Jun 22 '22 at 19:59

nehem · Answer 2 · 2019-01-23T23:06:03.060

365

One-liner: re.match(r"pattern", string) # No need to compile

import re
>>> if re.match(r"hello[0-9]+", 'hello1'):
...     print('Yes')
... 
Yes

You can evalute it as bool if needed

>>> bool(re.match(r"hello[0-9]+", 'hello1'))
True

edited Jan 23 '19 at 23:06

answered Jul 13 '16 at 03:09

nehem

12,775
6
58
84

1

That's weird. Why can you use `re.match` in the context of an `if`, but you have to use `bool` if you're using it elsewhere? – LondonRob Mar 13 '18 at 13:59
62

Careful with `re.match`. It only matches at the start of a string. Have a look at `re.search` instead. – LondonRob Mar 13 '18 at 14:02
3

@LondonRob probably because `if` checks for the match not being `None`. – Dennis Mar 18 '19 at 13:17
There's a big need to compile to make sure there are no errors in the regular expressions like bad character range errors – Suh Fangmbeng May 06 '20 at 17:10
2

@SuhFangmbeng Compilation is useful when the same `re` is used in more than one places to improve efficiency. In terms of error `.match` would throw the same error what `.compile` does. It's perfectly safe to use. – nehem May 06 '20 at 21:58
2

@nehem actually all of the regex functions in `re` module compile and cache the patterns. Therefore there is absolutely no efficiency gain using compile and then match than just directly calling `re.match`. All of these functions call the internal function `_compile` (including `re.compile`) which does the caching to a python dictionary. – miksus Jun 29 '21 at 21:12
If you're in python 3.8+ you can use the walrus operator if you need to access the match object: ```if (match := re.match(r"hello([0-9]+)", string)): print(match[1])``` – Peter Aug 11 '22 at 12:25

score 56 · Answer 3 · edited Feb 12 '15 at 10:54

56

Please try the following:

import re

name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"]

# Match names.
for element in name:
     m = re.match("(^[A-Z]\d[A-Z]\d)", element)
     if m:
        print(m.groups())

edited Feb 12 '15 at 10:54

Edd

3,724
3
26
33

answered Feb 12 '15 at 10:36

sumeet agrawal

561
4
2

1

This is the only case that returns the match which is required for getting groups. Best answer in my opinion. – Rick Smith Dec 12 '16 at 20:59
best answer among other answers – CK5 Jun 03 '22 at 11:16

Marc Cohen · Answer 4 · 2012-09-26T05:32:15.153

31

import re
import sys

prog = re.compile('([A-Z]\d+)+')

while True:
  line = sys.stdin.readline()
  if not line: break

  if prog.match(line):
    print 'matched'
  else:
    print 'not matched'

edited Sep 26 '12 at 05:32

answered Sep 26 '12 at 05:31

Marc Cohen

3,742
2
19
19

crypdick · Answer 5 · 2022-11-22T17:00:26.340

As stated in the comments, all these answers using re.match implicitly matches on the start of the string. re.search is needed if you want to generalize to the whole string.

import re

pattern = re.compile("([A-Z][0-9]+)+")

# finds match anywhere in string
bool(re.search(pattern, 'aA1A1'))  # True

# matches on start of string, even though pattern does not have ^ constraint
bool(re.match(pattern, 'aA1A1'))  # False

If you need the full string to exactly match the regex, see @Ali Sajjad's answer using re.fullmatch

Credit: @LondonRob and @conradkleinespel in the comments.

score 16 · Answer 6 · answered Aug 18 '22 at 07:19

Careful! (Maybe you want to check if FULL string matches)

The re.match(...) will not work if you want to match the full string.

For example;

re.match("[a-z]+", "abcdef") ✅ will give a match
But! re.match("[a-z]+", "abcdef 12345") ✅ will also give a match because there is a part in string which matches (maybe you don't want that when you're checking if the entire string is valid or not)

Solution

Use re.fullmatch(...). This will only match if the

if re.fullmatch("[a-z]+", my_string):
    print("Yes")

Example

re.fullmatch("[a-z]+", "abcdef") ✅ Yes
re.fullmatch("[a-z]+", "abcdef 12345") ❌ No

One liner: bool(re.fullmatch("[a-z]+", my_string))

Thanks. This is exactly what I wanted to see for "check if string matches" — mishadr, Oct 11 '22 at 16:56

score 12 · Answer 7 · answered Sep 26 '12 at 05:35

12

regular expressions make this easy ...

[A-Z] will match exactly one character between A and Z

\d+ will match one or more digits

() group things (and also return things... but for now just think of them grouping)

+ selects 1 or more

answered Sep 26 '12 at 05:35

Joran Beasley

110,522
12
160
179

score 12 · Answer 8 · answered Sep 26 '12 at 06:10

12

  
import re

ab = re.compile("^([A-Z]{1}[0-9]{1})+$")
ab.match(string)

I believe that should work for an uppercase, number pattern.

answered Sep 26 '12 at 06:10

Kneel-Before-ZOD

4,141
1
24
26

mike rodent · Answer 9 · 2023-03-09T08:40:04.473

Ali Sajjad's answer should be the default, i.e. fullmatch to avoid false positives.

However, it's also important to know that you're always checking not None for "yes, it's a match":

The two possibilities are therefore:

if re.fullmatch("[a-z]+", my_string)!=None:

or, as in Ali's answer:

if bool(re.fullmatch("[a-z]+", my_string)):

To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:

def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired
    return re.fullmatch(pattern, string, flags)!=None

Those 2 flags are (usually) the most helpful default flags settings in my experience, rather than "0".

In practice, of course, you may need to examine the Match object delivered by re.fullmatch. But for cases where you just need to find whether there's a match...

score 0 · Answer 10 · answered Jun 03 '23 at 02:47

Just want to point out that for string without line breaks (\n), one could also use the anchors \A and \Z for beginning and end of a string, respectively.

import re
pat = re.compile(r'\A([A-Z][0-9]+)+\Z')
pat.match('A1B2')   # match
pat.match('A1B2a')  # no match

This makes a difference if the string contains multiple lines and you want to match the pattern in latter lines in a string.

`match` vs `search` vs `fullmatch`. Which is appropriate?

re.search is the more general of the three. As others have said, re.match() checks for a match only at the beginning of the string. re.search() can mimic that too by prepending \A to whatever pattern used. On the other hand, re.fullmatch() checks if the entire string is a match, which can again be mimicked by re.search() by prepending \A and appending \Z to whatever pattern used. Below example may help illustrate this point.

# prepending \A to pattern makes `search` behave similar to `match`
s1 = '1B10L1'
pattern1 = r'([A-Z][0-9]+)+'

re.match(pattern1, s1)             # no match
re.search(pattern1, s1)            # match
re.search(fr"\A{pattern1}", s1)    # no match     <--- with \A prepended (behaves same as re.match)


# `match` only checks at the beginning and doesn't care about the end of the string
s2 = 'B10L1a'

re.match(pattern1, s2)             # match
re.match(fr"{pattern1}\Z", s2)     # no match     <--- with \Z appended (behaves same as re.fullmatch)
re.search(fr"\A{pattern1}\Z", s2)  # no match     <--- have the pattern between \A and \Z to mimic fullmatch
re.fullmatch(pattern1, s2)         # no match

If the string contains multiple lines and if it's flagged, then this relation breaks down: fullmatch never scans across lines and match scans only the first line.

# by flagging multiline, `match` and `search` match the pattern in the first line
s3 = 'B10\nL1'
pattern2 = r'^([A-Z][0-9]+)+$'
re.match(pattern2, s3, re.M)       # match
re.search(pattern2, s3, re.M)      # match
re.fullmatch(pattern2, s3, re.M)   # no match

# the pattern is in the second line but this will not be matched by `match`
s4 = 'a\nB10'
pattern2 = r'^([A-Z][0-9]+)+$'

re.match(pattern2, s4, re.M)              # no match
re.search(pattern2, s4, re.M)             # match
re.search(r'\A([A-Z][0-9]+)+', s4, re.M)  # no match  <--- with `\A` instead of `^` it mimics `match`

To compile or not to compile

If you need to search a pattern in a single string, then there's no need to compile it since re.search, re.match etc. all make calls to _compile method anyway. However, if you need to search a pattern in multiple strings, then compiling it first makes a lot of difference performance-wise. For the example in the OP, pre-compiling the pattern and searching through the list is over 2 times faster than not compiling first.

from timeit import timeit
setup2 = "import re; lst = ['A1B2', 'B10L1', 'C1N200J1K1', 'a1B2', 'A10B', 'AB400']"
setup1 = setup2 + "; pat = re.compile(r'^([A-Z][0-9]+)+$')"

for _ in range(3):
    t1 = timeit("list(map(pat.match, lst))", setup1)         # pre-compiled
    t2 = timeit("[re.match(r'^([A-Z][0-9]+)+$', x) for x in lst]", setup2)
    print(t2 / t1)
    
# 2.083788080189313
# 2.448126223007598
# 2.43617482049811

Check if string matches pattern

10 Answers10

Careful! (Maybe you want to check if FULL string matches)

Solution

Example

`match` vs `search` vs `fullmatch`. Which is appropriate?

To compile or not to compile

Linked

Related

Check if string matches pattern

10 Answers10

Careful! (Maybe you want to check if FULL string matches)

Solution

Example

match vs search vs fullmatch. Which is appropriate?

To compile or not to compile

Linked

Related

`match` vs `search` vs `fullmatch`. Which is appropriate?