1

In the following, I expect the second find() to succeed, but it does not. Why?

Matcher matcher = 
    Pattern.compile("\\s*asdf").matcher("apple banana cookie");

// returns false as expected
matcher.find();

// resets groups (that weren't being explicitly being used anyway), but not state.
matcher.usePattern(Pattern.compile("\\s*banana")); 

// returns false, expected true.
System.out.println(matcher.find());

If the quantifier is removed from the first regex (becoming simply "asdf"), the second match succeeds. Looking at the Matcher object reveals some kind of group information is stored after the first unsuccessful find(), although I wouldn't have expected it. Find() is supposed to start either at the beginning (if no previous match) or at the index of the last successful match. UsePattern() is supposed to preserve the Matcher's position in the input, and discard group information (that, again, I wasn't using explicitly).

I'm missing something, but I don't know what. I'm suspecting I have to implement this with lookingAt() and updating the region (such as this example), but I don't know why this approach isn't working.

VLAZ
  • 26,331
  • 9
  • 49
  • 67

2 Answers2

5

Your first regex consumes the entire string (\\\\s*). When the second regex is run there is nothing left to match.

If you call matcher.reset() it works as expected.

Saurabh Gokhale
  • 53,625
  • 36
  • 139
  • 164
adotout
  • 1,170
  • 12
  • 17
  • Ok, a careful rereading of the [Matcher JavaDoc](http://download.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html) with that in mind reveals that "A matcher finds matches in a subset of its input called the region. By default, the region contains all of the matcher's input." As far as I can tell, only the region() method is defined in the documentation to modify the region (although it seems fairly clear that the region gets modified at other times, hence the confusion). So, find() actually modifies the region? – Rich Fletcher Jun 14 '11 at 01:36
1

Looks like the documentation is a little misleading (or actually, it just doesn't specify) what the behavior is when you call find() after failure.

I suppose that the expected usage is that find() is called repeatedly until failure, but never after failure without resetting.

Looking at the source code confirms that Matcher has an index (the field last) from which it starts searching when doing the next 'find()', and when find() fails, that index is advanced to the end and isn't reset.

reset() resets that index, usePattern() doesn't.

trutheality
  • 23,114
  • 6
  • 54
  • 68
  • In my implementation I'm using the matcher to maintain state in a recursive call, so to solve my original problem I'm saving the old region and calling region(oldStart, oldEnd) which resets the matcher before setting the bounds. – Rich Fletcher Jun 14 '11 at 18:19
  • The region is different from the internal 'last' field. You can test it: calling find doesn't change the region. – trutheality Jun 14 '11 at 18:34