14

From the documentation, it's very clear that:

  • match() -> apply pattern match at the beginning of the string
  • search() -> search through the string and return first match

And search with '^' and without re.M flag would work the same as match.

Then why does python have match()? Isn't it redundant? Are there any performance benefits to keeping match() in python?

Eric
  • 95,302
  • 53
  • 242
  • 374
jai.maruthi
  • 157
  • 5
  • 1
    It's a convenience for a common pattern, and it makes the intent clearer. – Barmar Apr 22 '15 at 19:06
  • See [this question](https://stackoverflow.com/questions/12803709/re-match-vs-re-search-performance-difference) for performance benchmarks. `re.search` can actually be faster at times. – miradulo Apr 22 '15 at 19:07
  • 1
    possible duplicate: http://stackoverflow.com/questions/180986/what-is-the-difference-between-pythons-re-search-and-re-match – karthik manchala Apr 22 '15 at 19:12
  • Zen of Python: "There should be one — and preferably only one — obvious way to do it.". This obviously violates that. – Smit Johnth Jul 14 '15 at 19:17
  • @karthikmanchala this question doen't ask what's the difference but "who the heck done it and why?" – Smit Johnth Jul 14 '15 at 19:23

2 Answers2

11

The pos argument behaves differently in important ways:

>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None

match makes it possible to write a tokenizer, and ensure that characters are never skipped. search has no way of saying "start from the earliest allowable character".

Example use of match to break up a string with no gaps:

def tokenize(s, patt):
    at = 0
    while at < len(s):
        m = patt.match(s, pos=at)
        if not m:
            raise ValueError("Did not expect character at location {}".format(at))
        at = m.end()
        yield m
Eric
  • 95,302
  • 53
  • 242
  • 374
  • This may be true for pattern.match() and pattern.search() functions. For the re.match(patn, string, flags=0) function, which doesn't have 'pos' arguments, this explanation holds no good. As @Tim pointed, it may be an useful shortcut which doesn't need programmer to use '^'. – jai.maruthi Apr 24 '15 at 02:09
  • @mhr: It would be inconsistent if the match method was missing at the module level though – Eric Apr 24 '15 at 10:47
4

"Why" questions are hard to answer. As a matter of fact, you could define the function re.match() like this:

def match(pattern, string, flags):
    return re.search(r"\A(?:" + pattern + ")", string, flags)

(because \A always matches at the start of the string, regardless of the re.M flag status´).

So re.match is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches() which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).

It's different for the match and search methods of regex objects, though, as Eric has pointed out.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    _"you could define the function"_ No you can't. This match gives a different result when used with the `pos` argument (assuming you forwarded it where necesary) – Eric Apr 22 '15 at 19:18
  • @Eric: The module-level *functions* don't have a `pos` argument. Only the regex object's methods do (which is why I mentioned your answer in the last line). – Tim Pietzcker Apr 22 '15 at 19:18