1

The answers here suggesting to use Pattern.quote in order to escape the special regex characters.

The problem with Pattern.quote is it escapes the string as a whole, not each of the special character on its own.

This is my case:
I receive a string from the user, and need to search for it in a document. Since the user can't pass new line characters (It's a bug in a 3rd party API I have no access to), I decieded to treat any whitespace sequence as "\s+" and use a regex to search the document. This way the user can send a simple whitespace instead of a newline character.

For instance, if the document is:

The \s metacharacter is used to find a whitespace character.

A whitespace character can be:

  • A space character
  • A tab character
  • A carriage return character
  • A new line character
  • A vertical tab character
  • A form feed character
  • Then the received string

    String receivedStr = "The \s metacharacter is used to find a whitespace character. A whitespace character can be:";
    

    should be found in the document.

    To acheive this I want to quote the string, and then replace any whitespace sequence with the string "\s+".
    Using the following code:

    receivedStr = Pattern.quote(receivedStr).replaceAll("\\s+", "\\\\s+");
    

    yield the regex:

    \QThe\s+\s\s+metacharacter\s+is\s+used\s+to\s+find\s+a\s+whitespace\s+character.\s+A\s+whitespace\s+character\s+can\s+be:\E

    that will ofcourse ignore my added "\s+"'s instead of the expected:

    The\s+\\s\s+metacharacter\s+is\s+used\s+to\s+find\s+a\s+whitespace\s+character.\s+A\s+whitespace\s+character\s+can\s+be:

    that only escapes the "\s" literal and not the entire string.

    Is there an alternative to Pattern.quote that escapes single literals instead of the whole string?

    Community
    • 1
    • 1
    Elist
    • 5,313
    • 3
    • 35
    • 73

    1 Answers1

    2

    I would suggest something like this:

    String re = Stream.of(input.split("\\s+"))
                      .map(Pattern::quote)
                      .collect(Collectors.joining("\\s+"));
    

    This makes sure everything gets quoted (including stuff that otherwise would be interpreted as look-arounds and could cause exponential blowup in match finding), and any user entered whitespace ends up as unquoted \s+.

    Example input:

    Lorem \\b ipsum \\s dolor (sit) amet.
    

    Output:

    \QLorem\E\s+\Q\b\E\s+\Qipsum\E\s+\Q\s\E\s+\Qdolor\E\s+\Q(sit)\E\s+\Qamet.\E
    
    aioobe
    • 413,195
    • 112
    • 811
    • 826
    • I would probably use such a solution, though I will need to implement a Java 7 version of it. Thanks! – Elist Mar 26 '15 at 11:34