129

What is the difference between:

(.+?)

and

(.*?)

when I use it in my php preg_match regex?

David19801
  • 11,214
  • 25
  • 84
  • 127

9 Answers9

205

They are called quantifiers.

* 0 or more of the preceding expression

+ 1 or more of the preceding expression

Per default a quantifier is greedy, that means it matches as many characters as possible.

The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.

Example greedy/ungreedy

For example on the string "abab"

a.*b will match "abab" (preg_match_all will return one match, the "abab")

while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")

You can test your regexes online e.g. on Regexr, see the greedy example here

Morten Jensen
  • 5,818
  • 3
  • 43
  • 55
stema
  • 90,351
  • 20
  • 107
  • 135
  • 6
    "lazy" is the more common term for "ungreedy" – Walter Tross Mar 11 '17 at 10:40
  • The example is incorrect. Both `(.+?)` and `(.*?)` behave differently in a various position of regular expressions which are `a(.+?)`, `(.+?)b`, `a(.+?)b`, `a(.*?)`, `(.*?)b`, `a(.*?)b`. – Louis55 Nov 15 '18 at 08:36
  • Why wouldn't a.*b give back "ab"? Isn't it saying "word that has between a and b, 0 or more characters", therefore, ab has zero character between and could be a match. Why is this incorrect? – Hello World Jul 22 '20 at 03:40
  • @HelloWorld, this has to do with the greediness I explained above. `.*` will match as much as possible. If you want to stop as early as possible, then you have to make it ungreedy `.*?` – stema Jul 24 '20 at 07:16
  • FYI for newbies: `a.*?b` will match both the 1st and 2nd "ab"s in "abab", if you use "g" (global) flag. Also, the term "ungreedy" is better than "lazy" in this specific explanation because "lazy" is a commonly used term in programming, and it's a bit different from what `?` does in this example. – starriet Aug 27 '22 at 12:14
24

The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
16

In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:

  • {3,7} means between 3 to 7 matches
  • {,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
  • {3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
  • {,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
  • {5} means exactly 4

Most good languages contain abbreviations, so does RegEx:

  • + is the shorthand for {1,}
  • * is the shorthand for {,}
  • ? is the shorthand for {,1}

This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.

Credit: Codecademy.com

Miladiouss
  • 4,270
  • 1
  • 27
  • 34
11

+ matches at least one character

* matches any number (including 0) of characters

The ? indicates a lazy expression, so it will match as few characters as possible.

Xophmeister
  • 8,884
  • 4
  • 44
  • 87
10

A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.

So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.

DaveRandom
  • 87,921
  • 11
  • 154
  • 174
9

Consider below is the string to match.

ab

The pattern (ab.*) will return a match for capture group with result of ab

While the pattern (ab.+) will not match and not returning anything.

But if you change the string to following, it will return aba for pattern (ab.+)

aba
Azri Jamil
  • 2,394
  • 2
  • 29
  • 37
6

+ is minimal one, * can be zero as well.

jeroen
  • 91,079
  • 21
  • 114
  • 132
5

A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.

Prashant Ghimire
  • 4,890
  • 3
  • 35
  • 46
Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
0

I think the previous answers fail to highlight a simple example:

for example we have an array:

numbers = [5, 15]

The following regex expression ^[0-9]+ matches: 15 only. However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

kgui
  • 4,015
  • 5
  • 41
  • 53
  • Um, what?!? Why is this answer uv'ed at all? This is simply incorrect. Both patterns will definitely match strings `5` and `15`. – mickmackusa Jul 05 '21 at 09:26