675

What is the difference between the search() and match() functions in the Python re module?

I've read the Python 2 documentation (Python 3 documentation), but I never seem to remember it.

cottontail
  • 10,268
  • 18
  • 50
  • 51
Daryl Spitzer
  • 143,156
  • 76
  • 154
  • 173
  • 1
    The way I remember it is that "search" evokes the image in my mind of an explorer with binoculars searching off in to the distance, just like `search` will search to the end of the string off in the distance. – Andy Lester Nov 13 '22 at 17:56

10 Answers10

647

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Note: If you want to locate a match anywhere in string, use search() instead.

re.search searches the entire string, as the documentation says:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Note that match may differ from search even when using a regular expression beginning with '^': '^' matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The “match” operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches
Vin
  • 729
  • 9
  • 15
nosklo
  • 217,122
  • 57
  • 293
  • 297
  • What about strings containing newlines? – Daryl Spitzer Oct 08 '08 at 01:01
  • even with strings containing newlines, match() matches only at the BEGINNING of the string. – nosklo Oct 08 '08 at 01:05
  • That's the answer I was hoping for! (Especially now that you provided an example.) – Daryl Spitzer Oct 08 '08 at 01:19
  • 35
    Why would anyone use limited `match` rather than more general `search` then? is it for speed? – Alby Jul 23 '14 at 02:55
  • 23
    @Alby match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples. – Ivan Bilan May 24 '16 at 09:34
  • 45
    Well, that's goofy. Why call it `match`? Is it a clever maneuver to seed the API's with unintuitive names to force me to read the documentation? I still won't do it! Rebel! – Sammaron Sep 16 '16 at 15:14
  • 2
    @ivan_bilan `match` looks a bit `faster` than search when using the same regular expression but your example seems wrong according to a performance test: https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match/49710946#49710946 – baptx Jan 21 '19 at 18:56
  • 2
    When using a regular expression beginning with '^', and with ```MULTILINE``` unspecified, is ```match``` the same as ```search``` (produce the same result)? – Zitao Wang Aug 19 '19 at 14:23
  • 1
    @Sammaron: `search` searches in the string for the pattern, and `match` sees if the (entire) string matches the pattern. I don't find this entirely unintituitive. – ThePopMachine Sep 09 '21 at 15:54
  • @ThePopMachine: `match` sees if the string *begins* with the pattern. It doesn't have to match the whole string. The `fullmatch` method (introduced much later) is the only one that requires the pattern to match the whole string (by anchoring at both beginning and end). – ShadowRanger Aug 30 '22 at 18:50
  • Why is match not named match_start or something? I want my 2 hours back! – Carl Chang Sep 16 '22 at 11:06
  • They changed a lot of [this](https://docs.python.org/3/library/re.html) in the most recent python3 versions since the last time I used it – Nathan majicvr.com Oct 03 '22 at 17:48
  • @CarlChang yeah it's annoying – Nathan majicvr.com Oct 03 '22 at 17:49
140

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
Dhanasekaran Anbalagan
  • 2,524
  • 1
  • 16
  • 12
101

match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let's find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

match vs. search regex speedtest line plot

As you can see, searching for the pattern 'python' is faster than matching the pattern '(.*?)python(.*?)'.

Python is smart. Avoid trying to be smarter.

Jeyekomon
  • 2,878
  • 2
  • 27
  • 37
  • 38
    +1 for actually investigating the assumptions behind a statement meant to be taken at face value -- thanks. – Robert Dodier Oct 30 '18 at 16:37
  • 2
    Indeed the comment of @ivan_bilan looks wrong but the `match` function is still faster than the `search` function if you compare the same regular expression. You can check in your script by comparing `re.search('^python', word)` to `re.match('python', word)` (or `re.match('^python', word)` which is the same but easier to understand if you don't read the documentation and seems not to affect the performance) – baptx Jan 21 '19 at 18:36
  • 3
    @baptx I disagree with the statement that the `match` function is generally faster. The `match` is faster when you want to search **at the beginning** of the string, the `search` is faster when you want to search **throughout** the string. Which corresponds with the common sense. That's why @ivan_bilan was wrong - he used `match` to search throughout the string. That's why you are right - you used `match` to search at the beginning of the string. If you disagree with me, try to find regex for `match` that is faster than `re.search('python', word)` and does the same job. – Jeyekomon Jan 22 '19 at 10:57
  • @baptx Also, as a footnote, the `re.match('python')` **is** marginally faster than `re.match('^python')`. It has to be. – Jeyekomon Jan 22 '19 at 11:26
  • 1
    @Jeyekomon yes that's what I meant, `match` function is a bit faster if you want to search at the beginning of a string (compared to using `search` function to find a word at the beginning of a string with `re.search('^python', word)` for example). But I find this weird, if you tell the `search` function to search at the beginning of a string, it should be as fast as the `match` function. – baptx Jan 23 '19 at 20:23
  • @baptx My guess is that the `search` function has to parse and process the `^` information while `match` has it already hardcoded down in the c binary. The speed difference is only about 10 % on my PC anyway. – Jeyekomon Jan 24 '19 at 19:58
  • @Jeyekomon it could have come from here but I don't think it is the case since if we give the unnecessary `^` character to the `match` function, it does not take more time to read it (sometimes it was even a bit faster). – baptx Jan 26 '19 at 09:20
  • That `re.search()` is faster for this specific regex is not surprising at all. The `re.match()` pattern is longer to process if only because it's capturing the beginning of the string. – Denis de Bernardy Jul 21 '19 at 11:59
  • What about the library regex, is it faster than re? – skan Jul 29 '23 at 01:45
  • @skan The `regex` library is not really on-topic here. You should create a new question for that. – Jeyekomon Jul 31 '23 at 13:38
59

re.search searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

tzot
  • 92,761
  • 29
  • 141
  • 204
xilun
  • 631
  • 4
  • 2
41

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.

Community
  • 1
  • 1
ldR
  • 411
  • 4
  • 2
  • 5
    Would just like to add that search will return _sre.SRE_Match object (or None if not found). To get 'abc', you need to call t.group() – SanD Mar 01 '17 at 15:09
39

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)

More soberly, As John D. Cook remarks, re.match() "behaves as if every pattern has ^ prepended." In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern's left side. But it also doesn't anchor a pattern's right side: that still requires a terminating $.

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.

CODE-REaD
  • 2,819
  • 3
  • 33
  • 60
  • 5
    "behaves as if every pattern has ^ prepended." is only true if you don't use the multiline option. The correct statement is "... has \A prepended" – JoelFan Jun 27 '17 at 23:38
23

Much shorter:

  • search scans through the whole string.

  • match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc
Cabbage soup
  • 1,344
  • 1
  • 18
  • 26
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • Even with most of the examples posted here, I am having a hard time seeing the description 'beginning of the string' as an accurate statement. I don't know, it just seems arbitrary. How do I know where the beginning of the string 'ends'?? Is it via a newline? because based from the example here, 'beginning' simply means the very first character '1'. – noidentity63 Jan 05 '23 at 06:57
20

re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.

cschol
  • 12,799
  • 11
  • 66
  • 80
1

Quick answer

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)
Pall Arpad
  • 1,625
  • 16
  • 20
0

re.match is anchored at the beginning of a string, while re.search scans through the entire string. So in the following example, x and y match the same thing.

x = re.match('pat', s)       # <--- already anchored at the beginning of string
y = re.search('\Apat', s)    # <--- match at the beginning

If a string doesn't contain line breaks, \A and ^ are essentially the same; the difference shows up in multiline strings. In the following example, re.match will never match the second line, while re.search can with the correct regex (and flag).

s = "1\n2"
re.match('2', s, re.M)       # no match
re.search('^2', s, re.M)     # match
re.search('\A2', s, re.M)    # no match  <--- mimics `re.match`

There's another function in re, re.fullmatch() that scans the entire string, so it is anchored both at the beginning and the end of a string. So in the following example, x, y and z match the same thing.

x = re.match('pat\Z', s)     # <--- already anchored at the beginning; must match end
y = re.search('\Apat\Z', s)  # <--- match at the beginning and end of string
z = re.fullmatch('pat', s)   # <--- already anchored at the beginning and end

Based on Jeyekomon's answer (and using their setup), using the perfplot library, I plotted the results of timeit tests that looks into:

  • how do they compare if re.search "mimics" re.match? (first plot)
  • how do they compare if re.match "mimics" re.search? (second plot)
  • how do they compare if the same pattern is passed to them? (last plot)

Note that the last pattern doesn't produce the same output (because re.match is anchored at the beginning of a string.)

performance plot

The first plot shows match is faster if search is used like match. The second plot supports @Jeyekomon's answer and shows search is faster if match is used like search. The last plot shows there's very little difference between the two if they scan for the same pattern.


Code used to produce the performance plot.

import re
from random import choices
from string import ascii_lowercase
import matplotlib.pyplot as plt
from perfplot import plot

patterns = [
    [re.compile(r'\Aword'), re.compile(r'word')],
    [re.compile(r'word'), re.compile(r'(.*?)word')],
    [re.compile(r'word')]*2
]

fig, axs = plt.subplots(1, 3, figsize=(20,6), facecolor='white')
for i, (pat1, pat2) in enumerate(patterns):
    plt.sca(axs[i])
    perfplot.plot(
        setup=lambda n: [''.join(choices(ascii_lowercase, k=10)) for _ in range(n)],
        kernels=[lambda lst: [*map(pat1.search, lst)], lambda lst: [*map(pat2.match, lst)]],
        labels= [f"re.search(r'{pat1.pattern}', w)", f"re.match(r'{pat2.pattern}', w)"],
        n_range=[2**k for k in range(24)],
        xlabel='Length of list',
        equality_check=None
    )
fig.suptitle('re.match vs re.search')
fig.tight_layout();
cottontail
  • 10,268
  • 18
  • 50
  • 51