6

So I ran into a bug caused by expecting the matches() method to find exactly the same match as using find(). Normally this is the case, but it appears that if a non-greedy pattern can be stretched to greedily accept the whole string, its allowed. This seems like a bug in Java. Am I wrong? I don't see anything in the docs which indicates this behavior.

Pattern stringPattern = Pattern.compile("'.*?'");
String nonSingleString = "'START'===stageType?'active':''";
Matcher m1 = stringPattern.matcher(nonSingleString);
boolean matchesCompleteString = m1.matches();
System.out.println("Matches complete string? " + matchesCompleteString);
System.out.println("What was the match? " + m1.group()); //group() gets the string that matched

Matcher m2 = stringPattern.matcher(nonSingleString);
boolean foundMatch = m2.find(); //this looks for the next match
System.out.println("Found a match in at least part of the string? " + foundMatch);
System.out.println("What was the match? " + m2.group());

Outputs

Matches complete string? true
What was the match? 'START'===stageType?'active':''
Found a match in at least part of the string? true
What was the match? 'START'

Russell Leggett
  • 8,795
  • 3
  • 31
  • 45
  • Since `matches` only succeeds if it matches the entire string, I believe it follows that if you use `m.matches()` and it succeeds, the no-argument `m.group()` _always_ returns the entire input string. – ajb Jul 10 '14 at 16:34

2 Answers2

9

This makes perfect sense.

The matches(...) method must attempt to consume the whole string, so it does, even with a non-greedy pattern.

The find(...) method may find a substring, so it stops at the point if finds any matching substring.

Jamie Cockburn
  • 7,379
  • 1
  • 24
  • 37
  • +1. Put differently, reluctant quantifiers match as few items as possible, and the fewest items it can match in order to cover the entire string is all of them. – that other guy Jul 10 '14 at 16:25
  • 1
    I understand that they do different things, but I would expect *a match* to match on the same string, not change the behavior of a non-greedy pattern. – Russell Leggett Jul 10 '14 at 18:55
  • I guess what I would say is that this isn't intuitive to me, so while I can accept its not a bug, it doesn't follow my expectations. My mental model was that matches() is basically like doing a find() and then seeing if it consumed the whole input. Clearly that is not the case. – Russell Leggett Jul 10 '14 at 19:08
8

They are supposed to be different. Matcher#matches attempts to match the complete input string using the implicit anchors ^ and $ around your regex, whereas Matcher#find matches whatever your regex can match.

As per Javadoc:

public boolean matches()

Attempts to match the entire region against the pattern. If the match succeeds then more information can be obtained via the start, end, and group methods.

and

public boolean find()

Attempts to find the next subsequence of the input sequence that matches the pattern.

Ramvignesh
  • 210
  • 6
  • 16
anubhava
  • 761,203
  • 64
  • 569
  • 643