-1

Following is a simple piece of code about regex match:

import re

pattern = ".*"
s = "ab"
print(re.search(pattern, s))

output:

<_sre.SRE_Match object; span=(0, 2), match='ab'>

My confusion is "." matches any single character, so here it's able to match "a" or "b" , then with a "*" behind it, this combo should be able to match "" "a" or "aa" or "aaa..." or "b" or "bb" or "bbb..." or other single characters that repeat for several times.

But how comes it(".*") matches "ab" the same time?

Lunam
  • 7
  • 2

1 Answers1

-1

The comments more or less covered it, but to provide an answer: the pattern .* means to match any character . zero or more times *. And by default, a regex is greedy so when presented with 'abc', even though '' would satisfy that rule, or 'a' would, etc., it will match the entire string, since matching all of it still meets the requirement.

It does not mean to match the same character zero or more times. Every character it matches can be a different character or the same as a previously matched one.

If instead you want to match any character, but match as many of that same character as possible, zero or more times, you can use:

(.)?\1*

See here https://regex101.com/r/FgvuX2/1 and here https://regex101.com/r/FgvuX2/2

What this effectively does, is match a single character optionally, creating a back reference which can be used in the second part of the expression. Thus it matches any single character (if there is one) to group 1 and matches that group 1 zero or more times, being greedy.

Grismar
  • 27,561
  • 4
  • 31
  • 54