4

This line in Chrome/NodeJS causes it to crash. How come?

In the Chrome, it causes the browser tab to hang with no error message.

"www.asite.com/clothes-intimates-bras-bralettes/sub5-sub6-sub7-sub8".replace(/.*?([\w\s-]*)+\/?$/, 'www.asite.com/product/$1')
Jeremy J Starcher
  • 23,369
  • 6
  • 54
  • 74
MB.
  • 4,167
  • 8
  • 52
  • 79
  • 1
    What specific error are you getting in the console when it crashes? – powerc9000 May 28 '14 at 02:53
  • Firefox: "InternalError: an error occurred while executing regular expression" -- Your regex must be the problem. Post the expected output. – elclanrs May 28 '14 at 02:57
  • 3
    Looks like the `([\w\s-]*)+` causes to much backtracking and the engine gets in a weird state, or something like that. Doesn't happen when the `+` is removed. Solution: design your expression more carefully. – Felix Kling May 28 '14 at 02:57
  • 1
    Has to be a backtracking issue. This hung the Chrome console without an error message and totally froze up the tab. Had to shut down chrome to get rid of it. – Jeremy J Starcher May 28 '14 at 02:58
  • It just kills the browser in chrome. – MB. May 28 '14 at 02:59
  • I'm just curious why exactly this happens. Of course I changed the code. – MB. May 28 '14 at 02:59
  • 3
    Maybe this helps: http://stackoverflow.com/q/3212256/218196. It links to http://www.regular-expressions.info/catastrophic.html. – Felix Kling May 28 '14 at 03:08

1 Answers1

4

It might help to examine your regular expression in pieces to understand what's going on. Here's the original.

.*?([\w\s-]*)+\/?$

And the breakdown:

.*?

. = anything, * = zero or more, and ? = non-greedy.

([\w\s-]*)+

() = capture, [] = a group, \w = alphanumerics, \s = spaces, - = dashes, * = zero or more, + = one or more.

\/?

? = may or may not occur.

So essentially you're asking to match anything followed by a potentially empty group of charaters, spaces, or dashes which must occur once which is perhaps followed by a slash anchored to the end of the input string. The variable length matches .* and ([\w\s-]*)+ create a potentially infinite set of matches when the regular expression engine starts backtracking.

Your expression matches null (empty string) just as well as it matches -sub8 just as well as it matches www.asite.com/clothes-intimates-bras-bralettes/sub5-sub6-sub7-sub8. Or it could match .*? as nothing, followed by 7000 ([\w\s-] *) captures of nothing (remember * means nothing), then a final ([\w\s-] *) capture of the last character '8'... Sorry to beat on, I'm just trying to get you to an intuitive understanding of the significance of a ([] *)+ style capture.

The expression seems to be a result of not translating exactly what you're intending to match into a regular expression pattern correctly. What were you trying to achieve?

Paul
  • 1,502
  • 11
  • 19
  • FWIW, changing the part to `([\w\s-]+)+` exposes the same problem though. So, I don't think that `([]*)+` matches infinitely, but it does get into a bad/weird state. – Felix Kling May 28 '14 at 06:11