4

I have code like this:

url.match(/^https?\:\/\/([^\/:?#]+)(?:[\/:?#]|$)/ui)

ESLint says Parsing error: Invalid regular expression: /^https?\:\/\/([^\/:?#]+)(?:[\/:?#]|$)/: Invalid escape.

I don’t see why this regular expression is wrong. How should I fix it?

Sebastian Simon
  • 18,263
  • 7
  • 55
  • 75
Aero Wang
  • 8,382
  • 14
  • 63
  • 99

1 Answers1

7

Unnecessary escape sequences are invalid with the u flag

\: is an unnecessary escape sequence. Those are invalid when using the u flag. Just use : instead.

Specification, debuggers, documentation

These are the valid and necessary escape sequences of special characters outside of character classes: \$, \(, \), \*, \+, \., \?, \[, \\, \], \^, \{, \|, \} (all “syntax characters”), and \/ (special case of an identity escape).

Other escape sequences like \ , \!, \", \#, \%, \&, \', \,, \-, \:, \;, \<, \=, \>, \@, \_, \`, \~ are unnecessary and thus invalid with the u flag.

Look into the specification for all the escaping rules in detail.1


Tools like RegEx101 report this — a bit cryptic, though:

/\:/u:

\: — This token has no special meaning and has thus been rendered erroneous


As for documentation, I have just now added a note in the regex cheat sheet on MDN:

Note that some characters like :, -, @, etc. neither have a special meaning when escaped nor when unescaped. Escape sequences like \:, \-, \@ will be equivalent to their literal, unescaped character equivalents in regular expressions. However, in regular expressions with the unicode flag, these will cause an invalid identity escape error.

Rationale

The note continues:

This is done to ensure backward compatibility with existing code that uses new escape sequences like \p or \k.

When the feature was proposed and introduced, this is what the proposal’s FAQ had to say:

What about backwards compatibility?

In regular expressions without the u flag, the pattern \p is an (unnecessary) escape sequence for p. Patterns of the form \p{Letter} might already be present in existing regular expressions without the u flag, and therefore we cannot assign new meaning to such patterns without breaking backwards compatibility.

For this reason, ECMAScript 2015 made unnecessary escape sequences like \p and \P throw an exception when the u flag is set. This enables us to change the meaning of \p{…} and \P{…} in regular expressions with the u flag without breaking backwards compatibility.

This page is also linked from this ES Discuss thread where this question has been raised:

Why is RegExp /\-/u a syntax error?

JSLint previously warned against unescaped literal - in RegExp. However, escaping - together with unicode flag u causes a syntax error in Chrome, Firefox, and Edge (and JSLint has since removed the warning). Just curious about the reason why the above edge-case is a syntax error.

(Minor grammar adjustments by me.)

The responses link to the above GitHub repo with the proposal, but also explain the rationale in a different way:

Think of the u flag as a strict mode for regular expressions.

So, whenever you use the u flag, keep this in mind. RegExps begin to behave a little differently as soon as you use u. Certain new things become valid, but certain other things become invalid, too. For example, also see Why is /[\w-+]/ a valid regex but /[\w-+]/u invalid?.


1: You’ll find certain production rules with [U] which is a parameter that represents Unicode patterns. See the grammar notation reference for decoding these.

Sebastian Simon
  • 18,263
  • 7
  • 55
  • 75