1

I'm trying to use a regex in javascript where special characters and leading and trailing spaces are not allowed in an input element. Maximum character length is 50.

The regular expression I'm using is

/^[a-zA-Z0-9]+([ ]?[a-zA-Z0-9]+)*$/

But this is causing issues when the string is large and there is a special character in the end. On research I found that this is happening due to catastrophic backtracking but I'm unable to optimise the regex. Kindly help me find an optimum solution to this issue.

I tried debouncing the keyup event for the input element but that is also not solving the issue.

Luatic
  • 8,513
  • 2
  • 13
  • 34
Codermama
  • 19
  • 2
  • I think you might first assert that there is not an unexpected match at the end with a lookahead to fail the match if there is `^(?!.*[^\r\na-zA-Z0-9]$)[a-zA-Z0-9]+( [a-zA-Z0-9]+)*$` https://regex101.com/r/yrnbgH/1 – The fourth bird Mar 30 '23 at 17:57
  • Perhaps it would be better not to use a regex in this situation. The regex you have is extremely simple so a single for loop would likely be much faster. – Locke Mar 30 '23 at 18:03
  • @Locke no, a regex is the right tool here. It's just a shame that some JavaScript RegEx engines are implemented so poorly. – Luatic Mar 30 '23 at 18:37
  • I don't see catastrophic backtracking here. I also don't see an example of it to back up your claim. Why would `[ ]?` be optional here when `[a-zA-Z0-9]+`precedes it? The regex common scheme to avoid backtracking if using required greedy term before, is to allow an optional cluster with a required space as long as it is followed by alphanum. So the optimal regex if you want to lead with a greedy class is this `[a-zA-Z0-9]+([ ][a-zA-Z0-9]+)*`. The ideal is this `[a-zA-Z0-9]+([ ][a-zA-Z0-9])*`. Neither of these cause catastrophic backtracking. – sln Mar 30 '23 at 23:42

2 Answers2

1

It's as simple as removing the optional quantifier (?) from behind the space:
/^[a-zA-Z0-9]+([ ][a-zA-Z0-9]+)*$/ should be fine.

We need to argue two things:

  1. This doesn't change what the RegEx matches: The optional quantifier was unnecessary. The Kleene star (*) already makes the entire space-delimited part (([ ][a-zA-Z0-9]+)) optional. If there is no space, you'd have just alphanumerics, which the [a-zA-Z0-9]+ would've matched.
  2. This changes the runtime drastically: A character is either a space or matches [a-zA-Z0-9]. There are no "ambiguities"; there is only one state the RegEx engine can advance into: If it is an alphanumeric, it must stay in the greedy quantifier; if it is a space, it must advance, and then expect at least one alphanumeric. In particular, what can't happen anymore is that the RegEx engine has two options: Whether to enter another iteration of the Kleene star or stay in the current [a-zA-Z0-9]+.

I've applied some more changes, moving the Kleene star to the front and using the i flag to shorten it further (you can then omit the A-Z). The [ and ] brackets around the space were also unnecessary. Using The fourth bird's tests, here's how it performs: /^([a-z0-9]+ )*[a-z0-9]+$/i.

Like your current RegEx, this doesn't enforce the length requirement. I'd recommend enforcing the length requirement in JavaScript before testing against the RegEx. This will be the most readable & efficient.

Luatic
  • 8,513
  • 2
  • 13
  • 34
0

You might start the pattern matching a single char to prevent the lookahead firing on every line followed by asserting 0-49 chars after it.

Then either add the pattern in a capture group in a lookahead, and then match the string using the backreference to group 1 as lookaheads are atomic.

Using a case insensitive pattern with /i

^[a-z0-9](?=[a-z0-9 ]{0,49}$)(?=([a-z0-9]+(?: [a-z0-9]+)*))\1$

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70