In PCRE, the quantifier +
after another quantifier (either *
or +
or ?
or even the {m,n}
) actually modifies the preceding quantifier so that it now matches possessively.
*+
is a possessive quantifier meaning 0 or more, without backtracking.
Backtracking is one of the basic processes in regex. Let's say you have abcbaba
as string and use the regex .*bc
.
The engine will move following the arrow, first with .*
:
a b c b a b a
^
a b c b a b a
^
a b c b a b a
^
a b c b a b a
^
a b c b a b a
^
a b c b a b a
^
a b c b a b a
^
a b c b a b a
^
At this point, it cannot match more so it will backtrack one character at a time to be able to match the b
in the regex.
a b c b a b a
^
No b
, continue:
a b c b a b a
^
There, b
matches, so it tries to match c
, but cannot find one. It will backtrack again and a couple of steps later...
a b c b a b a
^
So .*
ended up matching only a
.
With .*+
, you get the .*
to match everything like in the first case...
a b c b a b a
^
But then cannot match more, and backtracking is forbidden to it. So the matching fails.
Sometimes, you want to have backtracking, but at other times, you don't and on the contrary, it's a nuisance. That's why you have possessive quantifiers and atomic groups, to speed things up.