Why does /^(.+)+Q$/.test("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX") take so long?

Question

When I run

/^(.+)+Q$/.test("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")

in Chrome or IE, it takes ~10 seconds to complete. (Firefox is able to evaluate it almost instantly.)

Why does it take so long? (And why/how is Firefox able to do it so quickly?)

(Of course, I'd never run this particular regex, but I'm hitting a similar issue with the URL regex at http://daringfireball.net/2010/07/improved_regex_for_matching_urls and it seems to boil down to this, i.e. there are certain URLs which will cause the browser to lock up)

For example:

var re = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
re.test("http://google.com/?q=(AAAAAAAAAAAAAAAAAAAAAAAAAAAAA")

One reason could be that it does a lot of backtracking. This does not explain though why Firefox is faster. Maybe they have some additional optimization. If you want to learn more about the inner workings of regex engines, I suggest to read http://shop.oreilly.com/product/9780596528126.do — Felix Kling, Apr 18 '12 at 21:29
Any regex engine that hits "catastrophic backtracking" on `(x+x+)+y` quite simply isn't worth a damn. That regex does not require any backtracking whatsoever. Not if implemented via an NFA simulator, and not as a DFA. — Kaz, Apr 18 '12 at 21:31
Is this an exercise in intellectual curiosity? If not, please explain why you're using that regex because it seems horrible unncessary as `/.+Q$/` will work just fine or even perhaps just `/Q$/`. — jfriend00, Apr 18 '12 at 21:34
Firefix probably detected that optimization; Chrome and IE evaluated the nested quantifiers naively. — Ray Toal, Apr 18 '12 at 21:39
@jfriend00 The actual issue I'm having is with a URL matching regex that I've added a link to — David Ingersol, Apr 18 '12 at 21:41
The NFA-simulator-based regex engine I wrote, closely based on the algorithm descriptions in The Dragon Book (*Compilers: Principles, Techniques and Tools*, Aho, Sethi, Ullman, 1988) handles this regex match in about 1.9 microseconds on a Core 2 Duo laptop (P8400 @ 2.26 Ghz). That could use some work. — Kaz, Apr 18 '12 at 21:44
It seems that the real question is how do you solve the problem in your article with good performance in all browsers. — jfriend00, Apr 18 '12 at 21:46
@jfriend00 Indeed, http://stackoverflow.com/questions/10218594 — David Ingersol, Apr 18 '12 at 21:53

Matthew Flaschen · Accepted Answer · 2012-04-19T01:14:21.373

8

As indicated by thg435, it sounds like catastrophic back-tracking. There's an excellent article on this, Regular Expression Matching Can Be Simple And Fast.

It describes an efficient approach known as Thompson NFA. As noted, though, this does not support all features of modern regexes. For instance, it can't do backreferences. However, as suggested in the article:

"Even so, it would be reasonable to use Thompson's NFA simulation for most regular expressions, and only bring out backtracking when it is needed. A particularly clever implementation could combine the two, resorting to backtracking only to accommodate the backreferences."

I suspect Firefox may be doing this.

edited Apr 19 '12 at 01:14

answered Apr 18 '12 at 21:29

Matthew Flaschen

278,309
50
514
539

2

If he said it in the comment, shouldn't he post it as an answeR? – Martin. Apr 18 '12 at 21:30
2

@Martin., I'm providing a totally different article. And I never said he shouldn't post an answer. – Matthew Flaschen Apr 18 '12 at 21:31
1

well, you posted the answer before you posted the link – Martin. Apr 18 '12 at 21:32

Why does /^(.+)+Q$/.test("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX") take so long?

1 Answers1

Linked