29

Assume a regular expression, which, via a Java Matcher object, is matched against a large number of strings:

String expression = ...; // The Regular Expression
Pattern pattern = Pattern.compile(expression);
String[] ALL_INPUT = ...; // The large number of strings to be matched

Matcher matcher; // Declare but not initialize a Matcher

for (String input:ALL_INPUT)
{
    matcher = pattern.matcher(input); // Create a new Matcher

    if (matcher.matches()) // Or whatever other matcher check
    {
         // Whatever processing
    }
}

In the Java SE 6 JavaDoc for Matcher, one finds the option of reusing the same Matcher object, via the reset(CharSequence) method, which, as the source code shows, is a bit less expensive than creating a new Matcher every time, i.e., unlike above, one could do:

String expression = ...; // The Regular Expression
Pattern pattern = Pattern.compile(expression);
String[] ALL_INPUT = ...; // The large number of strings to be matched

Matcher matcher = pattern.matcher(""); // Declare and initialize a matcher

for (String input:ALL_INPUT)
{
    matcher.reset(input); // Reuse the same matcher

    if (matcher.matches()) // Or whatever other matcher check
    {
         // Whatever processing
    }
}

Should one use the reset(CharSequence) pattern above, or should they prefer to initialize a new Matcher object every time?

VLAZ
  • 26,331
  • 9
  • 49
  • 67
PNS
  • 19,295
  • 32
  • 96
  • 143
  • 6
    By all means reuse the `Matcher`. The only good reason to create a new `Matcher` is to ensure thread-safety. That's why you don't make a `public static Matcher m`---in fact, that's the reason a separate `Pattern` class exists in the first place. – Marko Topolnik Jul 09 '12 at 08:22
  • 2
    So, for single-threaded applications even as an instance or class variable, or for multi-threaded ones in which the Matcher object is created inside a method, reset() is fine, yes? – PNS Jul 09 '12 at 08:25
  • 1
    @MarkoTopolnik: I think separating the compilation of the regex from it's application is another good reason for having a `Pattern` class. – Joachim Sauer Jul 09 '12 at 08:26
  • 1
    In every situation where you are sure there's only one user of `Matcher` at any point in time, it is OK to reuse it with `reset`. – Marko Topolnik Jul 09 '12 at 08:26
  • @JoachimSauer For that they could just have offered `public static Matcher compile(String regexp)`. It doesn't motivate a separate factory object. – Marko Topolnik Jul 09 '12 at 08:27
  • @MarkoTopolnik Thanks for the clarification. I have numerous scenarios of Matcher objects created inside static methods and heavily used, so replacing re-instantiation with reset() will definitely improve application performance. – PNS Jul 09 '12 at 08:30
  • @MarkoTopolnik: I don't follow. Would that `Matcher` then be a `Matcher` with no `CharSequence` to match on? So you'd suddenly have two different "states" of `Matcher`: with or without a string to be matched-on. – Joachim Sauer Jul 09 '12 at 08:30
  • @JoachimSauer I'm just writing a comment, not carefully designing the API as we speak. You must not imagine `Matcher` to be the one we are used to, but a blend of `Pattern` and `Matcher`. The API would just be different. For comparison you can see how they implemented XPath in jaxen. – Marko Topolnik Jul 09 '12 at 08:33
  • @MarkoTopolnik: ok, never mind my asking, I was just curious as how you'd have approached that. – Joachim Sauer Jul 09 '12 at 08:34
  • @JoachimSauer I've thought about it for a minute, it probably would make sense to have that kind of `Matcher` start its life in the state of "no input". That wouldn't be a new state, though, because the real JDK `Matcher` finds itself in more-or-less the same state when it is done processing a string. – Marko Topolnik Jul 09 '12 at 08:43
  • @MarkoTopolnik Can you please create a new answer to the question and copy-paste (a summary of) your comments there, so that I accept it and other people can mark it? Thanks! – PNS Jul 09 '12 at 09:11
  • 2
    Just for information, both reset() method and reset(CharSequence) method were introduced in java.util.regex.Matcher class in Java 1.5 and are there since then. – sactiw Jan 03 '14 at 16:08
  • In the "new Matcher" code, the premature declaration of the Matcher variable does not make any sense (it is not a C++ automatic variable), but the comment might look like it did to some readers. – EndlosSchleife Jul 06 '23 at 14:11

1 Answers1

39

By all means reuse the Matcher. The only good reason to create a new Matcher is to ensure thread-safety. That's why you don't make a public static Matcher m—in fact, that's the reason a separate, thread-safe Pattern factory object exists in the first place.

In every situation where you are sure there's only one user of Matcher at any point in time, it is OK to reuse it with reset.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436