4

What does these following two regular expression means?

.*? and .+?

Actually I understood usage these Quantifiers i.e.

'.' -> Any character
'*' -> 0 or more times
'+' -> once or more times
'?' -> 0 or 1

Indeed, I am literally confused!!! about using .*? and .+?.Could anybody show up with proper examples for these cases.

And you'r most welcome to share good links that presents useful examples practices. Thanks in advance.

puru
  • 266
  • 5
  • 13

3 Answers3

7

To break down we have:

. - Any character
* - Any number of times
? - That is consumed reluctantly

. - Any character
+ - At least once
? - That is consumed reluctantly

A reluctant or "non-greedy" quantifier (the '?') matches as little as possible in order to find a match. A more in-depth look at qantifiers (greedy, reluctant and possessive) can be found here

Rich O'Kelly
  • 41,274
  • 9
  • 83
  • 114
3

.*? and .+? are Reluctant quantifiers .

They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

Consider the code :

        String lines="some";
        String REGEX=".+?";
        Pattern pattern=Pattern.compile(REGEX);
        Matcher matcher =pattern.matcher(lines);
        while(matcher.find()){
            String result=matcher.group();
            System.out.println("RESULT of .+? : "+result);
            System.out.println("RESULT LENGTH : "+result.length());
        }
        System.out.println(lines);
        String REGEX1=".*?";
        Pattern pattern1=Pattern.compile(REGEX1);
        Matcher matcher1 =pattern1.matcher(lines);
        while(matcher1.find()){
            int start=matcher1.start() ;
            int end=matcher1.end() ;
            String result=matcher1.group();
            System.out.println("RESULT of .*? : "+result);
            System.out.println("RESULT LENGTH : "+result.length() +" ,  start "+ start+" end :"+end);
        }

OUTPUT:

RESULT of .+? : s
RESULT LENGTH : 1
RESULT of .+? : o
RESULT LENGTH : 1
RESULT of .+? : m
RESULT LENGTH : 1
RESULT of .+? : e
RESULT LENGTH : 1
some
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 0 end :0
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 1 end :1
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 2 end :2
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 3 end :3
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 4 end :4

.+? will try to find a match in each character and it matches (Length 1).

.*? will try to find match in each character or nothing . And it matches with empty string at each character.

Sujith PS
  • 4,776
  • 3
  • 34
  • 61
  • Thanks a lot. But I guess (.*) brings group. So In case of mine it just as .*? and .+?. Could you able to show up for these two with relevant example? – puru Jan 08 '14 at 09:52
2

To illustrate, consider the input string xfooxxxxxxfoo.

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The first example uses the greedy quantifier .* to find "anything", zero or more times, followed by the letters "f" "o" "o". Because the quantifier is greedy, the .* portion of the expression first eats the entire input string. At this point, the overall expression cannot succeed, because the last three letters ("f" "o" "o") have already been consumed. So the matcher slowly backs off one letter at a time until the rightmost occurrence of "foo" has been regurgitated, at which point the match succeeds and the search ends.

The second example, however, is reluctant, so it starts by first consuming "nothing". Because "foo" doesn't appear at the beginning of the string, it's forced to swallow the first letter (an "x"), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.

The third example fails to find a match because the quantifier is possessive. In this case, the entire input string is consumed by .*+, leaving nothing left over to satisfy the "foo" at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.

You can find this in the link http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

Lakshmi
  • 2,204
  • 3
  • 29
  • 49