11

When I run

/^(.+)+Q$/.test("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")

in Chrome or IE, it takes ~10 seconds to complete. (Firefox is able to evaluate it almost instantly.)

Why does it take so long? (And why/how is Firefox able to do it so quickly?)

(Of course, I'd never run this particular regex, but I'm hitting a similar issue with the URL regex at http://daringfireball.net/2010/07/improved_regex_for_matching_urls and it seems to boil down to this, i.e. there are certain URLs which will cause the browser to lock up)

For example:

var re = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
re.test("http://google.com/?q=(AAAAAAAAAAAAAAAAAAAAAAAAAAAAA")
  • 11
    http://www.regular-expressions.info/catastrophic.html – georg Apr 18 '12 at 21:28
  • 2
    One reason could be that it does a lot of backtracking. This does not explain though why Firefox is faster. Maybe they have some additional optimization. If you want to learn more about the inner workings of regex engines, I suggest to read http://shop.oreilly.com/product/9780596528126.do – Felix Kling Apr 18 '12 at 21:29
  • @thg post this as an answer, please – Martin. Apr 18 '12 at 21:30
  • Any regex engine that hits "catastrophic backtracking" on `(x+x+)+y` quite simply isn't worth a damn. That regex does not require any backtracking whatsoever. Not if implemented via an NFA simulator, and not as a DFA. – Kaz Apr 18 '12 at 21:31
  • @Kaz There is no such thing as backtracking in a FA... – Paul Apr 18 '12 at 21:33
  • 1
    Is this an exercise in intellectual curiosity? If not, please explain why you're using that regex because it seems horrible unncessary as `/.+Q$/` will work just fine or even perhaps just `/Q$/`. – jfriend00 Apr 18 '12 at 21:34
  • Firefix probably detected that optimization; Chrome and IE evaluated the nested quantifiers naively. – Ray Toal Apr 18 '12 at 21:39
  • @jfriend00 The actual issue I'm having is with a URL matching regex that I've added a link to – David Ingersol Apr 18 '12 at 21:41
  • The NFA-simulator-based regex engine I wrote, closely based on the algorithm descriptions in The Dragon Book (*Compilers: Principles, Techniques and Tools*, Aho, Sethi, Ullman, 1988) handles this regex match in about 1.9 microseconds on a Core 2 Duo laptop (P8400 @ 2.26 Ghz). That could use some work. – Kaz Apr 18 '12 at 21:44
  • It seems that the real question is how do you solve the problem in your article with good performance in all browsers. – jfriend00 Apr 18 '12 at 21:46
  • @jfriend00 Indeed, http://stackoverflow.com/questions/10218594 – David Ingersol Apr 18 '12 at 21:53

1 Answers1

8

As indicated by thg435, it sounds like catastrophic back-tracking. There's an excellent article on this, Regular Expression Matching Can Be Simple And Fast.

It describes an efficient approach known as Thompson NFA. As noted, though, this does not support all features of modern regexes. For instance, it can't do backreferences. However, as suggested in the article:

"Even so, it would be reasonable to use Thompson's NFA simulation for most regular expressions, and only bring out backtracking when it is needed. A particularly clever implementation could combine the two, resorting to backtracking only to accommodate the backreferences."

I suspect Firefox may be doing this.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539