2

Possible Duplicate:
what does lazy and greedy means in regexp?

I know that in Regex the question mark after *, + or ? means ungreedy but if I want to match any character, what is the difference between using (.*) or (.*?) ?

Thanks.

EDIT: In my case I want to check a URL. What are the differences between

http://site\.net/(.*?)\.html

and

http://site\.net/(.*)\.html

?

Community
  • 1
  • 1
mcont
  • 1,749
  • 1
  • 22
  • 33
  • 3
    http://stackoverflow.com/questions/2301285/what-does-lazy-and-greedy-means-in-regexp – Mike B Aug 30 '12 at 13:10
  • 3
    * See also [Open source RegexBuddy alternatives](http://stackoverflow.com/questions/89718/is-there) and [Online regex testing](http://stackoverflow.com/questions/32282/regex-testing) for some helpful tools that can analyze and explain given patterns, or [RegExp.info](http://regular-expressions.info/) for a nicer tutorial. – mario Aug 30 '12 at 13:11
  • 1
    @Matteo: Added explanation for your example – jgauffin Aug 30 '12 at 13:31

4 Answers4

21

.* is greedy, meaning that it will ignore the next delimiter of your regex until it itself is not fulfilled, unless the regex following .* is against the end of the target string.

.*? is ungreedy, meaning that it will proceed to the next delimiter of your regex, if the next is fulfilled. It will continue onto the next delimiter even if itself is still applicable.

Example:

/(.*) dog/ will match "I think your dog bit my dog" and group 1 will be "I think your dog bit my".

/(.*?) dog/ will match "I think your dog bit my dog" and group 1 will be "I think your".

FThompson
  • 28,352
  • 13
  • 60
  • 93
6

If there's nothing following the (.*) in the regular expression then there is absolutely no difference. However, if there is anything following, then there is a difference:

"I went to the shops and then I went home"

/(.*) went/  => "[I went to the shops and then I] went"
/(.*?) went/ => "[I] went"
Gareth
  • 133,157
  • 36
  • 148
  • 157
3

Assume that you got this url:

http://example.net/some/wierd/path.html?returnTo=somedoc.html

Greedy would match entire line:

http://example.net/some/wierd/path.html?returnTo=somedoc.html

while non greedy returns:

http://example.net/some/wierd/path.html
greg-449
  • 109,219
  • 232
  • 102
  • 145
jgauffin
  • 99,844
  • 45
  • 235
  • 372
2

As you already know hat ungreedy behaviour is, I won't explain that again.

It depends on what comes after the (.*?) - That's what's ungreedy behaviour for.

Interestingely enough, this means that a regex in the form /(.*?)/ doesn't make much sense - because how can you be lazy, if you match everyting anyway?

If you try to create this regex in e.g. Regexr, it won't even compile, because it's nonsense.

Only if you put something behind the group, your regex will make any kind of sense. I'm not sure if all rege engines do the same as Regexr here and deny to accept that regex.

So, if you want to match anything until a certain character, you'd have to put that specific character after your ungreedy-anything-group. This way, everything before that particular character is matched.

To bring it to a conclusion; it doesn't make any difference, IF there isn't something AFTER the group.

F.P
  • 17,421
  • 34
  • 123
  • 189