80

Could someone please give a complete list of special characters that should be escaped?

I fear I don't know some of them.

Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
Somebody
  • 9,316
  • 26
  • 94
  • 142

7 Answers7

74

PHP's preg_quote function takes arbitrary strings and "puts a backslash in front of every character that is part of the regular expression syntax" and it escapes these characters:

. \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

Here is a simplified version of the JavaScript re-implementation of preg_quote from Locutus:

function escapeRegexChars(str) {
  return str.replace(new RegExp('[.\\\\+*?\\[\\^\\]$(){}=!<>|:\\-]', 'g'), '\\$&')
}
Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
Tatu Ulmanen
  • 123,288
  • 34
  • 187
  • 185
  • 3
    If you're escaping these with str_replace, you should escape \ first. In the above list, if a . is replaced with \., \. will then be replaced with \\., which is not what is wanted. – Mark Rose Nov 06 '13 at 17:03
  • The colon (i.e. the `:`) should not be here, it is not a special regex character in JavaScript. – manymanymore Dec 16 '22 at 14:17
  • MDN reccomends another expression in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions article – Vladimir Nikotin Jul 05 '23 at 08:14
9

According to this site, the list of characters to escape is

[, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket ).

In addition to that, you need to escape characters that are interpreted by the Javascript interpreter as end of the string, that is either ' or ".

Andrea
  • 20,253
  • 23
  • 114
  • 183
  • Dunno what to make of that site. It covers lots of flavors of RegEx and doesn't specify which of those this list applies to. – BaldEagle Jul 30 '16 at 15:20
7

Based off of Tatu Ulmanen's answer, my solution in C# took this form:

private static List<string> RegexSpecialCharacters = new List<string>
{
    "\\",
    ".",
    "+",
    "*",
    "?",
    "[",
    "^",
    "]",
    "$",
    "(",
    ")",
    "{",
    "}",
    "=",
    "!",
    "<",
    ">",
    "|",
    ":",
    "-"
};


foreach (var rgxSpecialChar in RegexSpecialCharacters)
                rgxPattern = input.Replace(rgxSpecialChar, "\\" + rgxSpecialChar);

Note that I have switched the positions of '\' and '.', failure to process the slashes first will lead to doubling up of the '\'s

Edit

Here is a javascript translation

var regexSpecialCharacters = [
    "\\", ".", "+", "*", "?",
    "[", "^", "]", "$", "(",
    ")", "{", "}", "=", "!",
    "<", ">", "|", ":", "-"
];

regexSpecialCharacters.forEach(rgxSpecChar => 
    input = input.replace(new RegExp("\\" + rgxSpecChar,"gm"), "\\" + 
rgxSpecChar))
hngr18
  • 817
  • 10
  • 13
5

Inside a character set, to match a literal hyphen -, it needs to be escaped when not positioned at the start or the end. For example, given the position of the last hyphen in the following pattern, it needs to be escaped:

[a-z0-9\-_]+

But it doesn't need to be escaped here:

[a-z0-9_-]+

If you fail to escape a hyphen, the engine will attempt to interpret it as a range between the preceding character and the next character (just like a-z matches any character between a and z).

Additionally, /s do not be escaped inside a character set (though they do need to be escaped when outside a character set). So, the following syntax is valid;

const pattern = /[/]/;
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
jj2005
  • 133
  • 2
  • 6
2

The answer here has become a bit more complicated with the introduction of Unicode regular expressions in JavaScript (that is, regular expressions constructed with the u flag). In particular:

  • Non-unicode regular expressions support "identity" escapes; that is, if a character does not have a special interpretation in the regular expression pattern, then escaping it does nothing. This implies that /a/ and /\a/ will match in an identical way.

  • Unicode regular expressions are more strict -- attempting to escape a character not considered "special" is an error. For example, /\a/u is not a valid regular expression.

The set of specially-interpreted characters can be divined from the ECMAScript standard; for example, with ECMAScript 2021, https://262.ecma-international.org/12.0/#sec-patterns, we see the following "syntax" characters:

SyntaxCharacter :: one of
    ^ $ \ . * + ? ( ) [ ] { } |

In particular, in contrast to other answers, note that the !, <, >, : and - are not considered syntax characters. Instead, these characters might only have a special interpretation in specific contexts.

For example, the < and > characters only have a special interpretation when used as a capturing group name; e.g. as in

/(?<name>\w+)/

And because < and > are not considered syntax characters, escaping them is an error in unicode regular expressions.

> /\</
/\</

> /\</u
Uncaught SyntaxError: Invalid regular expression: /\</: Invalid escape

Additionally, the - character is only specially interpreted within a character class, when used to express a character range, as in e.g.

/[a-z]/

It is valid to escape a - within a character class, but not outside a character class, for unicode regular expressions.

> /\-/
/\-/

> /\-/u
Uncaught SyntaxError: Invalid regular expression: /\-/: Invalid escape

> /[-]/
/[-]/

> /[\-]/u
/[\-]/u

For a regular expression constructed using the / / syntax (as opposed to new RegExp()), interior slashes (/) would need to be escaped, but this is required for the JavaScript parser rather than the regular expression itself, to avoid ambiguity between a / acting as the end marker for a pattern versus a literal / in the pattern.

> /\//.test("/")
true

> new RegExp("/").test("/")
true

Ultimately though, if your goal is to escape characters so they are not specially interpreted within a regular expression, it should suffice to escape only the syntax characters. For example, if we wanted to match the literal string (?:hello), we might use:

> /\(\?:hello\)/.test("(?:hello)")
true

> /\(\?:hello\)/u.test("(?:hello)")
true

Note that the : character is not escaped. It might seem necessary to escape the : character because it has a special interpretation in the pattern (?:hello), but because it is not considered a syntax character, escaping it is unnecessary. (Escaping the preceding ( and ? characters is enough to ensure : is not interpreted specially.)


Above code snippets were tested with:

$ node -v
v16.14.0

$ node -p process.versions.v8
9.4.146.24-node.20
Kevin Ushey
  • 20,530
  • 5
  • 56
  • 88
  • I appreciate the thorough answer, but it would be much better if you gave a short, maximally useful answer/summary up-front instead of expecting everyone to read the entire answer. – Boris Verkhovskiy Apr 18 '23 at 01:15
1

The problem:

const character = '+'
new RegExp(character, 'gi') // error

Smart solutions:

// with babel-polyfill
// Warning: will be removed from babel-polyfill v7
const character = '+'
const escapeCharacter = RegExp.escape(character)
new RegExp(escapeCharacter, 'gi') // /\+/gi

// ES5
const character = '+'
const escapeCharacter = escapeRegExp(character)
new RegExp(escapeCharacter, 'gi') // /\+/gi

function escapeRegExp(string){
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}
haravares
  • 502
  • 1
  • 6
  • 12
0

I was looking for this list in regards to ESLint's "no-useless-escape" setting for reg-ex. And found some of these characters mentioned do not need to be escaped for a regular-expression in JS. The longer list in the other answer here is for PHP, which does require the additional characters to be escaped.

In this github issue for ESLint, about halfway down, user not-an-aardvark explains why the character referenced in the issue is a character that should maybe be escaped.

In javascript, a character that NEEDS to be escaped is a syntax character, or one of these:

^ $ \ . * + ? ( ) [ ] { } |

The response to the github issue I linked to above includes explanation about "Annex B" semantics (which I don't know much about) which allows 4 of the above mentioned characters to be UNescaped: ) ] { }.

Another thing to note is that escaping a character that doesn't require escaping won't do any harm (except maybe if you're trying to escape the escape character). So, my personal rule of thumb is: "When in doubt, escape"

Michael S
  • 726
  • 1
  • 10
  • 23
  • "escaping a character that doesn't require escaping won't do any harm" Unfortunately, this is no longer true, at least for `` in Firefox: https://stackoverflow.com/questions/36953775/firefox-error-unable-to-check-input-because-the-pattern-is-not-a-valid-regexp – nrkn Apr 16 '19 at 23:39