Python regex - understanding the difference between match and search

Question

From what I figured,

match: given a string str and a pattern pat, match checks if str matches the pattern from str's start.

search: given a string str and a pattern pat, search checks if str matches the pattern from every index of str.

If so, is there a meaning using '^' at the start of a regex with match?

From what I understood, since match already checks from the start, there isn't. I'm probably wrong; where is my mistake?

Have you read this? http://docs.python.org/library/re.html#search-vs-match. It explains everything. — jamylak, May 26 '12 at 12:46
Possible duplicate of [What is the difference between re.search and re.match?](https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match) — olbinado11, Feb 26 '19 at 00:43

jamylak · Answer 1 · 2012-05-26T13:03:17.290

I believe there is no use. The following is copy/pasted from: http://docs.python.org/library/re.html#search-vs-match

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).

For example:

>>> re.match("c", "abcdef")  # No match
>>> re.search("c", "abcdef") # Match
<_sre.SRE_Match object at ...>

Regular expressions beginning with '^' can be used with search() to restrict the match at the beginning of the string:

>>> re.match("c", "abcdef")  # No match
>>> re.search("^c", "abcdef") # No match
>>> re.search("^a", "abcdef")  # Match
<_sre.SRE_Match object at ...>

Note however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with '^' will match at the beginning of each line.

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<_sre.SRE_Match object at ...>

so ^ at the start of a regex with match is meaningless right? — user1413824, May 26 '12 at 13:00
Well the symbol itself can be used for other things such as in `[^\w]` for example but I can't see any use for it for checking for the start. — jamylak, May 26 '12 at 13:02

score 2 · Answer 2 · answered May 26 '12 at 12:47

2

In normal mode, you don't need ^ if you are using match. But in multiline mode (re.MULTILINE), it can be useful because ^ can match not only the beginning of the whole string, but also beginning of every line.

answered May 26 '12 at 12:47

chys

1,546
13
17

so re.MULTILINE has no meaning without ^ at start? – user1413824 May 26 '12 at 13:04
@user1413824 It seems so, except that `$` is also affected. According to Python docs, all what `re.MULTILINE` does is change the meanings of `^` and `$` – chys Jul 30 '12 at 12:10

score 2 · Accepted Answer · answered May 26 '12 at 16:25

When calling the function re.match specifically, the ^ character does have little meaning because this function begins the matching process at the beginning of the line. However, it does have meaning for other functions in the re module, and when calling match on a compiled regular expression object.

For example:

text = """\
Mares eat oats
and does eat oats
"""

print re.findall('^(\w+)', text, re.MULTILINE)

This prints:

['Mares', 'and']

With a re.findall() and re.MULTILINE enabled, it gives you the first word (with no leading whitespace) on each line of your text.

It might be useful if doing something more complex, like lexical analysis with regular expressions, and passing into the compiled regular expression a starting position in the text it should start matching at (which you can choose to be the ending position from the previous match). See the documentation for RegexObject.match method.

Simple lexer / scanner as an example:

text = """\
Mares eat oats
and does eat oats
"""

pattern = r"""
(?P<firstword>^\w+)
|(?P<lastword>\w+$)
|(?P<word>\w+)
|(?P<whitespace>\s+)
|(?P<other>.)
"""

rx = re.compile(pattern, re.MULTILINE | re.VERBOSE)

def scan(text):
    pos = 0
    m = rx.match(text, pos)
    while m:
        toktype = m.lastgroup
        tokvalue = m.group(toktype)
        pos = m.end()
        yield toktype, tokvalue
        m = rx.match(text, pos)

for tok in scan(text):
    print tok

which prints

('firstword', 'Mares')
('whitespace', ' ')
('word', 'eat')
('whitespace', ' ')
('lastword', 'oats')
('whitespace', '\n')
('firstword', 'and')
('whitespace', ' ')
('word', 'does')
('whitespace', ' ')
('word', 'eat')
('whitespace', ' ')
('lastword', 'oats')
('whitespace', '\n')

This distinguishes between types of word; a word at the beginning of a line, a word at the end of a line, and any other word.

Python regex - understanding the difference between match and search

3 Answers3