132

I am trying to create an application that matches a message template with a message that a user is trying to send. I am using Java regex for matching the message. The template/message may contain special characters.

How would I get the complete list of special characters that need to be escaped in order for my regex to work and match in the maximum possible cases?

Is there a universal solution for escaping all special characters in Java regex?

Ilya Kurnosov
  • 3,180
  • 3
  • 23
  • 37
Avinash Nair
  • 1,984
  • 2
  • 13
  • 17

10 Answers10

110
  • Java characters that have to be escaped in regular expressions are:
    \.[]{}()<>*+-=!?^$|
  • Two of the closing brackets (] and }) only have to be escaped after opening the same type of bracket.
  • In []-brackets some characters (like + and -) do sometimes work without escape.
Tobi G.
  • 1,530
  • 2
  • 13
  • 16
  • Is there any way to not escape but allow those characters? – Dominika May 02 '16 at 11:00
  • 1
    Escaping a character means to allow the character instead of interpreting it as an operator. – Tobi G. May 27 '16 at 14:28
  • 5
    Unescaped `-` within `[]` may not always work since it is used to define ranges. It's safer to escape it. For example, the patterns `[-]` and `[-)]` match the string `-` but not with `[(-)]`. – Kenston Choi Sep 12 '16 at 05:28
  • 1
    Even though the accepted answer does answer the question, this answer was more helpful to me when I was just looking for a quick list. – Old Nick Dec 05 '18 at 13:59
  • `-=!` do not necessarily need to be escaped, it depends on the context. For example as a single letter they work as a constant regex. – Hawk Aug 12 '20 at 09:03
103

You can look at the javadoc of the Pattern class: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

You need to escape any char listed there if you want the regular char and not the special meaning.

As a maybe simpler solution, you can put the template between \Q and \E - everything between them is considered as escaped.

azro
  • 53,056
  • 7
  • 34
  • 70
Sorin
  • 1,965
  • 2
  • 12
  • 18
  • 47
    If you find \Q and \E hard to remember you can use instead Pattern.quote("...") – mkdev Nov 06 '13 at 19:06
  • 21
    I wish you'd actually stated them – Aleksandr Dubinsky Jun 12 '14 at 23:24
  • Why, @AleksandrDubinsky ? – Sorin Jun 26 '14 at 08:15
  • 66
    @Sorin Because it is the spirit (nay, policy?) of Stack Exchange to state the answer in your answer rather than just linking to an off-site resource. Besides, that page doesn't have a clear list either. A list can be found here: http://docs.oracle.com/javase/tutorial/essential/regex/literals.html, yet it states "In certain situations the special characters listed above will *not* be treated as metacharacters," without explaining what will happen if one tries to escape them. In short, this question deserves a good answer. – Aleksandr Dubinsky Jun 26 '14 at 13:56
  • 13
    *"everything between them [`\Q` and `\E`] is considered as escaped"* — except other `\Q`'s and `\E`'s (which potentially may occur within original regex). So, it's better to use [`Pattern.quote`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#quote-java.lang.String-) as suggested [here](http://stackoverflow.com/a/37216573/1421194) and not to reinvent the wheel. – Sasha Nov 11 '16 at 21:22
34

To escape you could just use this from Java 1.5:

Pattern.quote("$test");

You will match exacty the word $test

madx
  • 6,723
  • 4
  • 55
  • 59
  • Why is this not the most highly rated answer? It solves the problem without going into the complex details of listing all characters that needs escaping and it's part of the JDK - no need to write any extra code! Simple! – Volksman Sep 24 '19 at 23:48
  • What if a regex contains \E? how can it be escaped? e.g: "\\Q\\Eeee\\E" throws a java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 4 – Asher A Jan 16 '22 at 15:15
18

According to the String Literals / Metacharacters documentation page, they are:

<([{\^-=$!|]})?*+.>

Also it would be cool to have that list refereed somewhere in code, but I don't know where that could be...

Bohdan
  • 16,531
  • 16
  • 74
  • 68
  • 12
    `String escaped = tnk.replaceAll("[\\<\\(\\[\\{\\\\\\^\\-\\=\\$\\!\\|\\]\\}\\)\\?\\*\\+\\.\\>]", "\\\\$0");` – marbel82 May 18 '16 at 22:10
  • 1
    The Pattern javadoc says it is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct, **but** a backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct. Therefore a much simpler regex will suffice: `s.replaceAll("[\\W]", "\\\\$0")` where `\W` designates non-word characters. – Joe Bowbeer Aug 01 '17 at 07:19
8

Combining what everyone said, I propose the following, to keep the list of characters special to RegExp clearly listed in their own String, and to avoid having to try to visually parse thousands of "\\"'s. This seems to work pretty well for me:

final String regExSpecialChars = "<([{\\^-=$!|]})?*+.>";
final String regExSpecialCharsRE = regExSpecialChars.replaceAll( ".", "\\\\$0");
final Pattern reCharsREP = Pattern.compile( "[" + regExSpecialCharsRE + "]");

String quoteRegExSpecialChars( String s)
{
    Matcher m = reCharsREP.matcher( s);
    return m.replaceAll( "\\\\$0");
}
NeuroDuck
  • 81
  • 1
  • 2
6

although the answer is for Java, but the code can be easily adapted from this Kotlin String extension I came up with (adapted from that @brcolow provided):

private val escapeChars = charArrayOf(
    '<',
    '(',
    '[',
    '{',
    '\\',
    '^',
    '-',
    '=',
    '$',
    '!',
    '|',
    ']',
    '}',
    ')',
    '?',
    '*',
    '+',
    '.',
    '>'
)

fun String.escapePattern(): String {
    return this.fold("") {
      acc, chr ->
        acc + if (escapeChars.contains(chr)) "\\$chr" else "$chr"
    }
}

fun main() {
    println("(.*)".escapePattern())
}

prints \(\.\*\)

check it in action here https://pl.kotl.in/h-3mXZkNE

pocesar
  • 6,860
  • 6
  • 56
  • 88
5

On @Sorin's suggestion of the Java Pattern docs, it looks like chars to escape are at least:

\.[{(*+?^$|
  • 4
    `String escaped = regexString.replaceAll("([\\\\\\.\\[\\{\\(\\*\\+\\?\\^\\$\\|])", "\\\\$1");` – fracz Oct 01 '14 at 19:24
  • 2
    `)` also has to be escaped, and depending on whether you are inside or outside of a character class, there can be more characters to escape, in which case `Pattern.quote` does quite a good job at escaping a string for use both inside and outside of character class. – nhahtdh Jun 16 '15 at 05:40
4

The Pattern.quote(String s) sort of does what you want. However it leaves a little left to be desired; it doesn't actually escape the individual characters, just wraps the string with \Q...\E.

There is not a method that does exactly what you are looking for, but the good news is that it is actually fairly simple to escape all of the special characters in a Java regular expression:

regex.replaceAll("[\\W]", "\\\\$0")

Why does this work? Well, the documentation for Pattern specifically says that its permissible to escape non-alphabetic characters that don't necessarily have to be escaped:

It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

For example, ; is not a special character in a regular expression. However, if you escape it, Pattern will still interpret \; as ;. Here are a few more examples:

  • > becomes \> which is equivalent to >
  • [ becomes \[ which is the escaped form of [
  • 8 is still 8.
  • \) becomes \\\) which is the escaped forms of \ and ( concatenated.

Note: The key is is the definition of "non-alphabetic", which in the documentation really means "non-word" characters, or characters outside the character set [a-zA-Z_0-9].

wheeler
  • 2,823
  • 3
  • 27
  • 43
3

on the other side of the coin, you should use "non-char" regex that looks like this if special characters = allChars - number - ABC - space in your app context.

String regepx = "[^\\s\\w]*";
Bo6Bear
  • 57
  • 2
2

Assuming that you have and trust (to be authoritative) the list of escape characters Java regex uses (would be nice if these characters were exposed in some Pattern class member) you can use the following method to escape the character if it is indeed necessary:

private static final char[] escapeChars = { '<', '(', '[', '{', '\\', '^', '-', '=', '$', '!', '|', ']', '}', ')', '?', '*', '+', '.', '>' };

private static String regexEscape(char character) {
    for (char escapeChar : escapeChars) {
        if (character == escapeChar) {
            return "\\" + character;
        }
    }
    return String.valueOf(character);
}
brcolow
  • 1,042
  • 2
  • 11
  • 33