1

For ".*?([a-m/]*).*" matching the string "fall/2005", I thought the ".*" will match any character 0 or more times. However, since there is a ? following .*, it only matches for 0 or 1 repetitions. So I thought .*? will match 'f' but I'm wrong.

What is wrong in my logic?

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Iann Wu
  • 155
  • 2
  • 12
  • See [this question](http://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers) – David Z Oct 04 '13 at 05:01

4 Answers4

10

The ? here acts as a 'modifier' if I can call it like that and makes .* match the least possible match (termed 'lazy') until the next match in the pattern.

In fall/2005, the first .*? will match up to the first match in ([a-m/]*), which is just before f. Hence, .*? matches 0 characters so that ([a-m/]*) will match fall/ and since ([a-m/]*) cannot match anymore, the next part of the pattern .* matches what's left in the string, meaning 2005.

In contrast to .*([a-m/]*).*, you would have .* match as much as possible first (meaning the whole string) and try to go back to make the other terms match. Except that the problem is with the other quantifiers being able to match 0 characters as well, so that .* alone will match the whole string (termed 'greedy').


Maybe a different example will help.

.*ab

In:

aaababaaabab

Here, .* will match as much characters as possible and then try to match ab. Thus, .* will match aaababaaab and the remainder will be matched by ab.

.*?ab

In:

aaababaaabab

Here, .*? will match as little as possible until it can match ab in that regex. The first occurrence of ab is here:

aaababaaabab
  ^^

And so, .*? matches aa while ab will match ab.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • 2
    Finally, someone who fully explains this correctly. The other answers make it sound like `.*?` always matches only the empty string. Mind showing a counter-example to demonstrate when `.*?` would actually match something, maybe even when it's helpful (i.e. `z01234z56789z`, trying to get only what is between the first two `z`s)? – Nicole Oct 04 '13 at 05:09
  • 1
    @NickC I was crafting one :) – Jerry Oct 04 '13 at 05:13
  • @Jerry do you live on here? ;) – hwnd Oct 04 '13 at 05:18
  • Hi Jerry, thanks for the detailed explanation and the example. For .*?ab, doesnt it fail to complete the match of the whole string of aaababaaabab since there is nothing to complete the remaining match of (abaaabab) after the first aaab? – Iann Wu Oct 04 '13 at 05:19
  • `.*?` would match the first `aa` unless you used the `g` modifier. – hwnd Oct 04 '13 at 05:22
  • @IannWu It depends on the function you're using in Java. For instance, `.matches` will try to match the whole string, while `.matcher` will try to find any match in the string. Most functions with regex apply a pattern onto any match v/s the whole string, and what's the basis of regex searches :) – Jerry Oct 04 '13 at 05:23
  • @hwnd There's no g modifier in Java. – Jerry Oct 04 '13 at 05:24
  • Ahh I meant in general, but yes `matches` in Java, sorry I am used to PCRE =) – hwnd Oct 04 '13 at 05:25
  • @IannWu I'm glad it helped you! ^^ – Jerry Oct 04 '13 at 05:25
  • @hwnd Yup, that's another reason why the language is important in regex, because of those different flavours! And Java's one of the flavours I don't particularly like ^^; But oh well, gotta do what we can with it. – Jerry Oct 04 '13 at 05:26
  • @hwnd Also, no, I don't live on here. I just happened to be around xD – Jerry Oct 04 '13 at 05:27
3

In regex:

? : Occurs no or one times, ? is short for {0,1}

*? : ? after a quantifier makes it a reluctant quantifier, it tries to find the smallest match.

Juned Ahsan
  • 67,789
  • 12
  • 98
  • 136
1

Suppose if you have a string input like this

this is stackoverflow

and you use regex

.*

so output will be

this is stackoverflow

but if you use regex

.*?

your out put will be

this

So from the above example it is clear that if you use .* it will give you whole string. to prevent this if you want only first cherector before space you should use .*?

For more practical knowledge you can check http://regexpal.com/

user2379020
  • 322
  • 3
  • 13
1

The ? (question mark) is considered lazy here or so called not greedy.

Your regular expression:

.*?         any character except newline \n (0 or more times)
            (matching the least amount possible)

 (          group and capture to \1:

 [a-m/]*    any character of: 'a' to 'm', '/' (0 or more times)
            (matching the most amount possible)

 )          end of \1

 .*         any character except newline \n (0 or more times)
            (matching the most amount possible)
Community
  • 1
  • 1
hwnd
  • 69,796
  • 4
  • 95
  • 132