9

Wikipedia says that extended regexes "dropped support for backreferences", so the "basic" regex mode has to be used to enable those. However, it seems that a number of implementations do support backreferences for extended regexes. For example, with gcc 4.6 on Ubuntu Precise, they are supported. FreeBSD implementations seem to support them only in basic mode.

Boost says (and seems to agree with Wikipedia) that backreferences are not supported for extended regexes, but Boost::Regex adds them as an extension.

Is this just a poorly defined part of the standard which is interpreted differently by every implementation?

mrk
  • 4,999
  • 3
  • 27
  • 42
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • Wikipedia is not always the best source for reliable information. I see no reason why backreferences should not be supported in extended regex in any implementation/environment. POSIX standard does not support back-references for extended regular expressions, but I don't know any implementation that would follow this "standard". – Ωmega Nov 10 '12 at 14:43
  • +Ωmega apparently FreeBSD implementations do follow this. Reading the "re_format" manpage, a special "enhanced" mode was added to support backreferences in extended mode – Eli Bendersky Nov 10 '12 at 14:46
  • It seems that `egrep` in FreeBSD supports them. – Ωmega Nov 10 '12 at 14:49
  • OpenBSD, OTOH, does seem to reserve backreferences only to BREs (Basic Regular Expressions) – Eli Bendersky Nov 10 '12 at 14:53

3 Answers3

8

As others have already pointed out, it's quite clear that POSIX EREs do not support back-references.

The rationale given in the OpenGroup Base Specifications Issue 7 for not adding back-references to EREs is given as:

It was suggested that, in addition to interval expressions, back-references ( '\n' ) should also be added to EREs. This was rejected by the standard developers as likely to decrease consensus.

Quoted from: Rationale: Base Definitions: Extended Regular Expressions

The primary reason for this limitation is to allow POSIX EREs to be converted to a deterministic finite automata (DFA), and indeed the original implementation of EREs in Unix was done as a DFA. The use of a DFA allows guarantees to be made about the performance of the implementation. Pattern matching with (an unbounded number of) back references is an NP-hard problem, and perhaps even an NP-complete problem. Consensus in the POSIX standards committee could never have been reached if back-references were proposed for EREs because that would force all the companies using the original Unix implementation to change their code to a non-deterministic implementation and to drop their performance guarantees, and some of those companies had members on the committee.

It has also been noted that back-references in REs are not intuitive for either users or implementors, and indeed they've caused extreme confusion more often than now. See for example the examples given in RE-Interpretation: The Dark Corners

NOTE: back-references in REs are not the same as references to sub-patterns in substitution text in tools such as sed.

Greg A. Woods
  • 2,663
  • 29
  • 26
4

According to the IEEE/Open Group standard Extended Regular Expressions don't support Backreferences (Section 9.5.1) although several real-world implementations do.

Yahia
  • 69,653
  • 9
  • 115
  • 144
1

According to the POSIX.1-2008 standard, only Basic Regular Expressions support back-references. Section 9.3.6 describes how they work in BREs. The Extended Regular Expressions section doesn't mention them at all, and the Grammar Lexical Conventions in section 9.5.1 say that back-reference tokens only apply to BREs.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589