1

I have two regex in my javascript code

const regex = /[0-9a-zA-Z_ ]{3,}/gm;
const htmlRegex = /\r\n|\n|\r|&nbsp;|<[^>]*>/gm;

first regex is for check string exist or not by 3 digit and second regex is for check at string include html code or not. But this regex is to slow takes 1-2 second.

How to fix this ? thanks before.

Aminudin
  • 109
  • 2
  • 8
  • Please reconsider using regex to detect html? – evolutionxbox Mar 29 '21 at 01:05
  • do you have another solution? @evolutionxbox – Aminudin Mar 29 '21 at 01:07
  • Sorry I misunderstood “detect” to mean “parse”. Does this question help? https://stackoverflow.com/questions/15458876/check-if-a-string-is-html-or-not/15458987 – evolutionxbox Mar 29 '21 at 01:11
  • What kind of input do you run this on, and how, so it takes "1-2 second[s]"? Why do they have `g` flags, if they just check for existance? – ASDFGerte Mar 29 '21 at 01:17
  • This regex `[0-9a-zA-Z_ ]{3,}` matches 3 or more times any of the listed and could also match 3 spaces. To match 3 digits you can use `\d{3}` In the second pattern you can shorten the alternation to `\r?\n| |<[^>]*>` – The fourth bird Mar 29 '21 at 09:01
  • If you need to check at least one occurrence use `regex.test(string)` and `htmlRegex.test(string)`, and remove `gm` flags. – Wiktor Stribiżew Mar 29 '21 at 09:33

1 Answers1

0

Regarding the regex patterns:

  • [0-9a-zA-Z_ ]{3,} is fully unanchored, it can match at any position of the string, where the regex engine will try to find 3 or more alpahanumeric or underscore or space chars as many as possible. Once the partial match is found, say, 1 or 2 chars like that, it will fail and will go on searching for the pattern from the next location. With longer strings with no matches, this may lead to slow performance. It is not easy to improve the pattern like this especially without knowing the exact requirements. In most cases, you just need to figure out the left-hand boundary and try to precise it as much as possible. Say, if you want your matches to start with a word char, you could use \w[\w ]{2,} (note \w = [0-9a-zA-Z_]).
  • \r\n|\n|\r|&nbsp;|<[^>]*> is not optimal as \r\n and \r alternatives can start matching at the same location in the string. It increases backtracking steps. The best way to solve that is to make sure different alternatives do not match at the same locations, e.g. \r\n?|\n|&nbsp;|<[^>]*>.

Regarding the code, you should consider using RegExp.test() if you simply want to check if a match is found in a string or not. For that to work properly, you need to remove the g flag. Note that m flag is redundant in your regexps since you do not use ^ nor $. So, your check would look like /\r\n?|\n|&nbsp;|<[^>]*>/.test(string).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563