What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match
regex?
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match
regex?
They are called quantifiers.
*
0 or more of the preceding expression
+
1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ?
after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b
will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b
will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+
) is one or more characters. The second (*
) is zero or more characters. Both are non-greedy (?
) and match anything (.
).
In RegEx, {i,f}
means "between i
to f
matches". Let's take a look at the following examples:
{3,7}
means between 3 to 7 matches {,10}
means up to 10 matches with no lower limit (i.e. the low limit is 0){3,}
means at least 3 matches with no upper limit (i.e. the high limit is infinity){,}
means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity){5}
means exactly 4 Most good languages contain abbreviations, so does RegEx:
+
is the shorthand for {1,}
*
is the shorthand for {,}
?
is the shorthand for {,1}
This means +
requires at least 1 match while *
accepts any number of matches or no matches at all and ?
accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+
matches at least one character
*
matches any number (including 0) of characters
The ?
indicates a lazy expression, so it will match as few characters as possible.
A +
matches one or more instances of the preceding pattern. A *
matches zero or more instances of the preceding pattern.
So basically, if you use a +
there must be at least one instance of the pattern, if you use *
it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*)
will return a match for capture group with result of ab
While the pattern (ab.+)
will not match and not returning anything.
But if you change the string to following, it will return aba
for pattern (ab.+)
aba
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+
matches: 15
only.
However, ^[0-9]*
matches both 5 and 15
. The difference is that the +
operator requires at least one duplicate of the preceding regex expression