1

I was learning regex with JavaScript and this code seems to be very slow.

/(a*a*)*b/.test("aaaaaaaaaaaaa")

To me it seems really straight forward, can someone explain to me why is this happening ?

Ahmed Tounsi
  • 1,482
  • 1
  • 14
  • 24
  • 2
    On what ground are you saying that the regex is slow ? How many ms, have you run a benchmark ? Did you compare with others way to match the data ? – Orelsanpls Jan 29 '20 at 16:23
  • @GrégoryNEUT It takes too long to run, and I am asking why this regex specifically takes too long. – Ahmed Tounsi Jan 29 '20 at 16:25
  • 1
    "too long" HOW long? – Ghost Jan 29 '20 at 16:26
  • 1
    @GrégoryNEUT I cannot tell you how many ms, because it did not finish running yet – Ahmed Tounsi Jan 29 '20 at 16:27
  • 2
    Why didn't you start with that, then? "It doesn't stop running" is very different from "it takes too long" –  Jan 29 '20 at 16:28
  • 4
    This is a [catastrophic backtracking](http://www.rexegg.com/regex-explosive-quantifiers.html) – Toto Jan 29 '20 at 16:28
  • 1
    Regex performance is really hard to reason about without a solid understanding of *how exactly the are evaluated*. The **tl;dr** is: `(a*a*)*` can be very, very expensive for pretty arcane reasons. Non-matching regex *tend* to be worse than matching regex in this case. Check out articles [like this one](https://www.loggly.com/blog/regexes-the-bad-better-best/). – Joachim Sauer Jan 29 '20 at 16:28
  • @JoachimSauer thank you for answering my question. I will be checking the article. – Ahmed Tounsi Jan 29 '20 at 16:29
  • 1
    @yxor: also check the link provided by Toto, they gave you the name of the actual phenomenon (which I couldn't remember), which should be useful in researching it. – Joachim Sauer Jan 29 '20 at 16:30
  • Thanks @Toto, I will be checking it. – Ahmed Tounsi Jan 29 '20 at 16:31
  • 1
    Regexes are in general slow, in terms of the effort they need to find matches. Some, in the server side can even be exploited for DOS attacks. Basically each regex is the representation of its own logic, like yours, it needs to figure out the match for the wildcard expansion. This is a simple regex, but still needs to be interpreted and check against the input. – Victor Jan 29 '20 at 16:32

1 Answers1

0

Regexes are in general slow, in terms of the effort they need to find matches. Some, in the server side can even be exploited for DOS attacks.

Basically each regex is the representation of its own logic, a syntax that actually represents a matching algorithm. In your case, it needs to figure out the match for the wildcard expansions. This is a simple regex, but still needs to be interpreted and check against the input.

Victor
  • 3,520
  • 3
  • 38
  • 58