3

According to this question, there is a big difference between find and matches(), still both provide results in some form.

As a kind of Utility the toMatchResult function returns with the current results of the matches() operation. I hope my assumption under (1) is valid. (regex is here)

        String line = "aabaaabaaabaaaaaab";
        String regex = "(a*b)a{3}";
        Matcher matcher = Pattern.compile(regex).matcher(line);
        matcher.find();
//        matcher.matches();(1) --> returns false because the regex doesn't match the whole string
        String expectingAab = matcher.group(1);
        System.out.println("actually: " + expectingAab);

Unfortunately the following in no way works ( Exception: no match found ):

        String line = "aabaaabaaabaaaaaab";
        String regex = "(a*b)a{3}";
        String expectingAab = Pattern.compile(regex).matcher(line).toMatchResult().group(1);
        System.out.println("actually: " + expectingAab);

Why is that? My first assupmtion was that it doesn't work because the regex should match the whole string; but the same exceptio is being thrown with the string value aabaaa as well...

Of course the matcher needs to be set to the correct state with find(), but what if I'd like to use a oneliner for it? I actually implemented a utility calss for this:


protected static class FindResult{
    private final Matcher innerMatcher;
    public FindResult(Matcher matcher){
        innerMatcher = matcher;
        innerMatcher.find();
    }
    public Matcher toFindResult(){
        return  innerMatcher;
    }
}

public static void main(String[] args){
    String line = "aabaaabaaabaaaaaab";
    String regex = "(a*b)a{3}";
    String expectingAab = new FindResult(Pattern.compile(regex).matcher(line)).toFindResult().group(1);
    System.out.println("actually: " + expectingAab);
}

I know full well that this is not an optimal solution to create a oneliner, especially because it puts heavy loads to the garbage collector..

Is there an easier, better solution for this?

It's worth noting, that I'm looking for a solution java8. The matching logic works differently above java 9.

Dávid Tóth
  • 2,788
  • 1
  • 21
  • 46
  • If you don't want to create new objects, why not just use a static method? You don't need to store any state. Do you just not like the aesthetics of something like `MatcherUtils.findResult(Pattern.compile("...").matcher("..."))`? – Sweeper Apr 07 '21 at 09:27
  • This is actually a valid point! Thanks, this something I'd also accept as an answer, should there be no built-in functionality for this. – Dávid Tóth Apr 07 '21 at 09:45

2 Answers2

7

The toMatchResult() method returns the state of the previous match operation, whether it was find(), lookingAt(), or matches().

Your line

String expectingAab = Pattern.compile(regex).matcher(line).toMatchResult().group(1);

does not invoke any of those methods, hence, will never have a previous match and always produce a IllegalStateException: No match found.

If you want a one-liner to extract the first group of the first match, you could simply use

String expectingAab = line.replaceFirst(".*?(a*b)a{3}.*", "$1");

The pattern needs .*? before and .* after the actual match pattern, to consume the remaining string and only leave the first group as its content. The caveat is that if no match exists, it will evaluate to the original string.

So if you want matches rather than find semantic, you can use

String expectingNoMatch = line.replaceFirst("^(a*b)a{3}$", "$1");

which will evaluate to the original string with the example input, as it doesn’t match.

If you want your utility method not to create a FindResult instance, just use a straight-forward static method.

However, this is a typical case of premature optimization. The Pattern.compile invocation creates a Pattern object, plus a bunch of internal node objects representing the pattern elements, the matcher invocation creates a Matcher instance plus arrays to hold the groups, and the toMatchResult invocation creates another object instance, and of course, the group(1) invocation unavoidably creates a new string instance representing the result.

The creation of the FindResult instance is the cheapest in this row. If you care for performance, you keep the result of Pattern.compile if you use the pattern more than once, as that’s the most expensive operation and the Pattern instance is immutable and shareable, as explicitly stated in its documentation.

Of course, the string methods replaceFirst and replaceAll do no magic, but perform the same steps under the hood.

Eugene
  • 117,005
  • 15
  • 201
  • 306
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    Premature optimisation is a very good point. [You can't avoid creating objects](https://softwareengineering.stackexchange.com/a/149569/189201) :) – Sweeper Apr 07 '21 at 10:00
  • 1
    @Sweeper indeed and the actual costs of an object creation depend on its constructor’s work, not the allocation, i.e. a string creation copies the entire character content and `Pattern.compile` does a lot of actual work processing the pattern. Whereas the costs of an ephemeral object with no actual construction work, i.e. impacts on GC, have been analyzed in [this answer](https://stackoverflow.com/a/54619104/2711488) of mine… – Holger Apr 07 '21 at 10:05
3

The method doesn't need instance fields to work. It can just be a static helper:

class MatcherUtils {
  public static MatchResult findResult(Matcher matcher) {
    matcher.find();
    return matcher.toMatchResult();
  }
}

Usage:

MatchResult result = MatcherUtils.findResult(Pattern.compile("...").matcher("..."));

Note that you might want to handle the case when find can't find anything (Thanks for the one-liner, Holger!):

class MatcherUtils {
  public static Optional<MatchResult> findResult(Matcher matcher) {
    return Optional.of(matcher)
             .filter(Matcher::find)
             .map(Matcher::toMatchResult);
    /*
    if (matcher.find()) {
      return Optional.of(matcher.toMatchResult());
    } else {
      return Optional.empty();
    }
    */
  }
}
Sweeper
  • 213,210
  • 22
  • 193
  • 313