9

I am using the following regular expression without restricting any character length:

var test =  /^(a-z|A-Z|0-9)*[^$%^&*;:,<>?()\""\']*$/ // Works fine

In the above when I am trying to restrict the characters length to 15 as below, it throws an error.

var test =  /^(a-z|A-Z|0-9)*[^$%^&*;:,<>?()\""\']*${1,15}/    //**Uncaught SyntaxError: Invalid regular expression**

How can I make the above regular expression work with the characters limit to 15?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Viku
  • 2,845
  • 4
  • 35
  • 63

1 Answers1

21

You cannot apply quantifiers to anchors. Instead, to restrict the length of the input string, use a lookahead anchored at the beginning:

// ECMAScript (JavaScript, C++)
^(?=.{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*$
^^^^^^^^^^^

// Or, in flavors other than ECMAScript and Python
\A(?=.{1,15}\z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\z
^^^^^^^^^^^^^^^

// Or, in Python
\A(?=.{1,15}\Z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\Z
^^^^^^^^^^^^^^^

Also, I assume you wanted to match 0 or more letters or digits with (a-z|A-Z|0-9)*. It should look like [a-zA-Z0-9]* (i.e. use a character class here).

Why not use a limiting quantifier, like {1,15}, at the end?

Quantifiers are only applied to the subpattern to the left, be it a group or a character class, or a literal symbol. Thus, ^[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']{1,15}$ will effectively restrict the length of the second character class [^$%^&*;:,<>?()\"'] to 1 to 15 characters. The ^(?:[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*){1,15}$ will "restrict" the sequence of 2 subpatterns of unlimited length (as the * (and +, too) can match unlimited number of characters) to 1 to 15 times, and we still do not restrict the length of the whole input string.

How does the lookahead restriction work?

The (?=.{1,15}$) / (?=.{1,15}\z) / (?=.{1,15}\Z) positive lookahead appears right after ^/\A (note in Ruby, \A is the only anchor that matches only start of the whole string) start-of-string anchor. It is a zero-width assertion that only returns true or false after checking if its subpattern matches the subsequent characters. So, this lookahead tries to match any 1 to 15 (due to the limiting quantifier {1,15}) characters but a newline right at the end of the string (due to the $/\z/\Z anchor). If we remove the $ / \z / \Z anchor from the lookahead, the lookahead will only require the string to contain 1 to 15 characters, but the total string length can be any.

If the input string can contain a newline sequence, you should use [\s\S] portable any-character regex construct (it will work in JS and other common regex flavors):

// ECMAScript (JavaScript, C++)
^(?=[\s\S]{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*$
 ^^^^^^^^^^^^^^^^^

// Or, in flavors other than ECMAScript and Python
\A(?=[\s\S]{1,15}\z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\z
  ^^^^^^^^^^^^^^^^^^

// Or, in Python
\A(?=[\s\S]{1,15}\Z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\Z
  ^^^^^^^^^^^^^^^^^^
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Not really sure you need this regex. It will match 0 or more letters or digits, and then 0 or more characters other than the ones in the `$%^&*;:,<>?()"'` set. Please clarify what strings are valid (you want to match) and those that are not. – Wiktor Stribiżew Sep 09 '15 at 10:41
  • If you have newline symbols in your string, replace the first look-ahead with `(?=[\s\S]{1,15}$)`. – Wiktor Stribiżew Sep 09 '15 at 13:10
  • In C# the above one won't work when trying to assign the regEx to a string . Do i need to do this ^(?=.{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\""']*$ – Viku Sep 10 '15 at 11:51
  • It will work if you use a normal string literal. With verbatim string literal, it will look like `var rx = new Regex(@"^(?=.{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()""']*$")`. – Wiktor Stribiżew Sep 10 '15 at 12:26
  • @WiktorStribiżew, I've read [this article](http://www.rexegg.com/regex-lookarounds.html) about using solely lookaheads for validation (not only about string length) like this: `\A(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d)\w{6,10}\z` - what do you think about such approach? – Max Koretskyi Mar 03 '16 at 11:53
  • 1
    @Maximus The principle of contrast is a proper way to go in your pattern. It is not about restricting the input string length though, it is a specific validation issue. This regex will not work in JS, BTW. – Wiktor Stribiżew Mar 03 '16 at 11:56