-2

Consider the following pieces of code:

Pattern p = Pattern.compile(Pattern.quote("[r.e.g.e.x]"));

and

Pattern p = Pattern.compile("\\Q" + "[r.e.g.e.x]" + "\\E");

As far as I know, they produce the exact same output. I know that the first is more easy to read, as stated in this answer. But which approach is better or faster?

Community
  • 1
  • 1
u32i64
  • 2,384
  • 3
  • 22
  • 36

1 Answers1

1

The statement in the answer that:

Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal.

Is strictly speaking not correct. Indeed. Because that would give weird results if \Q and \E are already in the original string.

If you call for instance Pattern.quote("\\Q[r.e.g.e.x]\\E") it will produce "\\Q\\Q[r.e.g.e.x]\\E\\\\E\\Q\\E".

As a result wrapping "\\Q" and "\\E" is obviously incorrect (for some edge-cases, I admit that). You better use Pattern.quote if you want to be safe.

The wrapping with "\\Q" and "\\E" you do yourself will be a bit faster (since you save on a method call, an indexOf(..) and an if statement in case there is no "\\E"), but usually you better use libraries since they tend to contain less bugs, and if there are bugs, these are resolved eventually.

You can find the source code here:

public static String quote(String s) {
    int slashEIndex = s.indexOf("\\E");
    if (slashEIndex == -1)
        return "\\Q" + s + "\\E";

    StringBuilder sb = new StringBuilder(s.length() * 2);
    sb.append("\\Q");
    slashEIndex = 0;
    int current = 0;
    while ((slashEIndex = s.indexOf("\\E", current)) != -1) {
        sb.append(s.substring(current, slashEIndex));
        current = slashEIndex + 2;
        sb.append("\\E\\\\E\\Q");
    }
    sb.append(s.substring(current, s.length()));
    sb.append("\\E");
    return sb.toString();
}

So as long as there is no "\\E", we are fine. But in the other case, we have to substitute every "\\E" with "\\E\\\\E\\Q"...

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555