1

I'm trying to understand the following code:

Pattern.compile("(.*?):")

I already did some research about what it could mean, but I don't quite get it:

According to the java docs the * would mean 0 or more times, while ? means once or not at all.

Also, what does the ':' mean?

Thanks

Andreas
  • 2,007
  • 5
  • 26
  • 37
  • 3
    Check [this](http://rick.measham.id.au/paste/explain.pl?regex=%28.*%3F%29%3A) out. – Kendall Frey Sep 03 '12 at 13:29
  • @KendallFrey: Nice, but in Java `.` is equivalent to `[^\n\r\u0085\u2028\u2029]`. [ref](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#lt) – Alan Moore Sep 03 '12 at 13:46

4 Answers4

5

This is called a reluctant quantifier. An asterisk and a question mark *? together mean "zero or more times, without matching more characters from the input than is needed". This is what prevents the dot . expression from matching the subsequent colon : in the input.

A better expression to match the same sequence is [^:]*:, because it lets you avoid backtracking. Here is a link to an article explaining why.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    The word is *quantifier*, not *qualifier*: it specifies a quantity. In fact, you might say the `?` qualifies the quantifier; `*` is normally greedy, but the `?` "weakens" it, making it reluctant or non-greedy. – Alan Moore Sep 03 '12 at 15:41
  • @AlanMoore You're right, it is quantifier. I edited the answer to fix that. – Sergey Kalinichenko Sep 03 '12 at 15:45
4

The ? after greedy operators such as + or * will make the operator non greedy. Without the ?, that regex will keep matching all the characters it finds, including the :.

As it is, the regex will match any string which happens before the semi colon (:). In this case, the semicolon is not a special character. What ever comes before the semicolon, will be thrown into a group, which can be accessed later through a Matcher object.

This code snippet will hopefully make things more clear:

    String str = "Hello: This is a Test:";
    Pattern p1 = Pattern.compile("(.*?):");
    Pattern p2 = Pattern.compile("(.*):");
    
    Matcher m1 = p1.matcher(str);
    if (m1.find())
    {
        System.out.println(m1.group(1));            
    }
    
    Matcher m2 = p2.matcher(str);
    if (m2.find())
    {
        System.out.println(m2.group(1));
    }

Yields:

Hello

Hello: This is a Test

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96
1

This regular expression means anthing ending with : or it could be understood as anthing till first :.

Here ':' means nothing. but it complies for pattern anystring: will match to this pattern

Abhishek bhutra
  • 1,400
  • 1
  • 11
  • 29
0

I think the '?' is redundant and will be applied on '.*'.

':' has no special meaning whatsoever in regexps and will be matched to the characters in the string.

EDIT: dasblinkenlight is be right, if greedy the regexp will try to match as much as they can, and he is right in his suggestion as well.

I found a link which lists greedy vs reluctant: What is the difference between `Greedy` and `Reluctant` regular expression quantifiers?

Community
  • 1
  • 1
BlueTrin
  • 9,610
  • 12
  • 49
  • 78