1

I've faced with strange behavior of java.util.regex.Matcher. Lets consider example:

    Pattern p = Pattern.compile("\\d*");
    String s = "a1b";
    Matcher m = p.matcher(s);
    while(m.find())
    {
        System.out.println(m.start()+" "+m.end());
    }

It produces output:

0 0
1 2
2 2
3 3

I can understant all lines except last. Matcher creates extra group (3,3) out of string. But javadoc for method start() confirms:

start() Returns the start index of the previous match.

The same case for dot-star pattern:

Pattern p = Pattern.compile(".*");
String s = "a1b";
Matcher m = p.matcher(s);
while(m.find())
{
    System.out.println(m.start()+" "+m.end());
}

Output:

0 3
3 3

But if specify line boundaries

Pattern p = Pattern.compile("^.*$");

The output will be "right":

0 3

Can someone explain me а reason of such behavior?

VLAZ
  • 26,331
  • 9
  • 49
  • 67

1 Answers1

1

The pattern "\\d*" matches 0 or more digits. Same stands for ".*". It matches 0 or more occurrence of any character except newline.

The last match that you get is the empty string at the end of your string, after "b". The empty string satisfies the pattern \\d*. If you change the pattern to \\d+, you'll get expected result.

Similarly, the pattern .* matches everything from first character to last character. Thus it first matches "a1b". After that the cursor is after b: "a1b|". Now, matcher.find() again runs, and finds a zero-length string at the cursor, which satisifies the pattern .*, so it considers it as a match.

The reason why it gives expected output with "^.*$" is that the last empty string doesn't satisfy the ^ anchor. It is not at the beginning of the string, so it fails to match.

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • Yes, you are right about \\d+ , but what "empty string" do you mean? I didn't see any in javadocs. – Gusev Dmitry Apr 09 '14 at 18:56
  • @user3516622 Empty string is the one after the last character of the string. Technically, there is an empty character after and before every character in a string. – Rohit Jain Apr 09 '14 at 18:58