4

A cursory look at the documentation for QRegexp shows that it supports backreferences, while QRegularExpression makes no mention of it. This would be notable since regular expression matching without backreferences can scale in linear time, while including backreferences scales as exponential time (source [dead link], cached version).

A similar StackOverflow answer also mentions the main differences are in the speed of execution. It would be logical to consider that the new regular expression class could employ a new algorithm, which would allow it to search in linear time, however, I have no direct knowledge of this. Are there any differences similar to the above in the new QRegularExpression class?

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67

1 Answers1

4

QRegularExpression departs from the functionality of QRegExp in 5 particular areas:

  1. Support for \A and \z
  2. Support for Global Matching
  3. Removal of Wildcard Matching
  4. Removal of support for inferior regex syntaxes (only Perl Compatible Regular Expressions (PCRE) are supported now)
  5. Added support for *? and +? modifiers
  6. Removed support for the un-useful QRegExp::CaretModes with the exception of QRegExp::CaretAtOffset

Of these changes 4 is very relevant to this question. PCRE is by far the most advanced regular expression engine available. So it certainly supports back-references as well as many other features QRegExp could never aspire to.

As far as the rationale behind not explicitly mentioning every functionality provided by PCRE is related to the fact that PCRE is defined by a 70k word document. If you're interested in burning through a toner cartridge, QRegularExpression links directly to this page: http://pcre.org/pcre.txt

Incidentally quoting from that tome these are the back reference sytaxes PCRE supports:

  • \n reference by number (can be ambiguous)
  • \gn reference by number
  • \g{n} reference by number
  • \g{-n} relative reference by number
  • \k<name> reference by name (Perl)
  • \k'name' reference by name (Perl)
  • \g{name} reference by name (Perl)
  • \k{name} reference by name (.NET)
  • (?P=name) reference by name (Python)
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • It would be interesting if they ever chose to implement a separate searching algorithm for regexes compiled with and without backreferences, but I guess that's already in the domain of Google's re2. – Alex Huszagh Jun 27 '16 at 18:21
  • @AlexanderHuszagh I'd be interested in a source. I wouldn't have thought that the back reference code would have made any difference in the run-time at all, being gated and everything. – Jonathan Mee Jun 27 '16 at 18:24
  • It's linked in the question under "cached" (the original link went dead). It's specifically about how there is an exponential search time involving backtracking. Perl, for example, limits this if backreferences aren't used (but incompletely). https://webcache.googleusercontent.com/search?q=cache:9MDYjLMkY8QJ:https://swtch.com/~rsc/regexp/regexp1.html&num=1&client=ubuntu&hs=w8b&hl=fr&gl=us&strip=0&vwsrc=0 – Alex Huszagh Jun 27 '16 at 18:27
  • 1
    @AlexanderHuszagh An interesting read to be sure. The author does cite: "As of Perl 5.6, Perl's regular expression engine is said to memoize the recursive backtracking search, which should, at some memory cost, keep the search from taking exponential amounts of time unless backreferences are being used." But apparently that eventually breaks down. I'm all for speed ups including in regex code, but given the choice I'd take PCRE over the alternative, cause by the time I'm working with regexes, I'm so deep in the mud I need a lot of help to get out. – Jonathan Mee Jun 27 '16 at 18:38