3

I want to quote a piece of string to be treated as a literal string inside a larger regex expression, and that expression needs to conform to the POSIX Extended Regular Expressions format.

This question is very similar to this existing question, except that the answer there does not satisfy me since it proposes I use Pattern.quote(), which relies on the special \Q and \E marks - those are supported by Java regexes but do not conform to the POSIX Extended format.

For example, I want one.two to become one\.two and not \Qone.two\E.

Community
  • 1
  • 1
Oak
  • 26,231
  • 8
  • 93
  • 152

2 Answers2

3

Maybe something along these lines:

// untested
String escape(String inString)
{
    StringBuilder builder = new StringBuilder(inString.length() * 2);
    String toBeEscaped = "\\{}()[]*+?.|^$";

    for (int i = 0; i < inString.length(); i++)
    {
        char c = inString.charAt(i);

        if (toBeEscaped.contains(c))
        {
            builder.append('\\');
        }

        builder.append(c);
    }

    return builder.toString();
}
Brian Reichle
  • 2,798
  • 2
  • 30
  • 34
2

The answer by Brian can be simplified to

String toBeEscaped = "\\{}()[]*+?.|^$";
return inString.replaceAll("[\\Q" + toBeEscaped + "\\E]", "\\\\$0");

Tested with "one.two" only.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • How do you mean it? It works for all examples I can come with. – maaartinus Mar 01 '11 at 12:05
  • @Sean: It replaces one character *at a time*, but `replaceAll()` iterates through all the characters in the string. It does take a ridiculous amount of code to replace that one character, though. I've always just done it @Brian's way; it's *so* much easier to read. – Alan Moore Mar 01 '11 at 13:25
  • I was hoping there's a built-in method of doing this... but it looks like I'll have to write it on my own or just use your very concise solution, thank you. – Oak Mar 01 '11 at 16:48
  • @Oak There's j.u.r.Pattern.RemoveQEQuoting() doing something like this, but it's private and specialized. @Alan I think Brian's code is neither easier to read nor faster. Of course, the former is subjective and the latter would need some microbenchmarking. – maaartinus Mar 01 '11 at 16:55
  • `RemoveQEQuoting()` is only there to fix a bug in the original implementation. Ironically, that bug manifested only when `\Q` and `\E` were used inside character classes, as your solution does. – Alan Moore Mar 01 '11 at 17:44
  • [Here you go.](http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6173522) The bug report doesn't discuss the solution, but if you look at the source for JDK 1.5 you'll see that it treats `\Q...\E` as described in the report; `RemoveQEQuoting()` was added in JDK 1.6 ("mustang"). – Alan Moore Mar 01 '11 at 23:57