6

I'm reading PCRE documentation and I noticed that possessive quantifier + and once-only subpatterns (?>), also known as atomic groups, are somewhat similar by concept. Is there any substantial difference?

Desmond Hume
  • 8,037
  • 14
  • 65
  • 112

3 Answers3

8

(?>) is actually atomic grouping.

From Atomic Grouping on regular-expressions.info:

An atomic group is a group that, when the regex engine exits from it, automatically throws away all backtracking positions remembered by any tokens inside the group. Atomic groups are non-capturing. The syntax is (?>group).

From Possessive Quantifiers on regular-expressions.info:

Possessive quantifiers are a way to prevent the regex engine from trying all permutations. This is primarily useful for performance reasons. You can also use possessive quantifiers to eliminate certain matches.

From the same page:

Technically, possessive quantifiers are a notational convenience to place an atomic group around a single quantifier. All regex flavors that support possessive quantifiers also support atomic grouping. But not all regex flavors that support atomic grouping support possessive quantifiers. With those flavors, you can achieve the exact same results using an atomic group.

Basically, instead of X*+, write (?>X*). It is important to notice that both the quantified token X and the quantifier are inside the atomic group. Even if X is a group, you still need to put an extra atomic group around it to achieve the same effect. (?:a|b)*+ is equivalent to (?>(?:a|b)*) but not to (?>a|b)*. The latter is a valid regular expression, but it won't have the same effect when used as part of a larger regular expression.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

If you have a look at this page of regular-expressions.info, you will notice in the table that "x++ is identical to (?>x+)".

The only difference noted is:

Possessive quantifiers are a limited yet syntactically cleaner alternative to atomic grouping.

So, it's not as popular as atomic grouping, but it can be considered cleaner.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • By the way, atomic grouping is synonymous to once-only subpatterns. Just different ways of calling the same thing. – Jerry Aug 17 '13 at 19:20
1

Note that (?>X+) is not exactly the same as X++ in a backtracking point of view. Because inside parenthesis, the regex engine has the possibility to backtrack, thus the regex engine records allways backtrack positions inside an atomic group (but forgets them once the parenthesis closed), it can, of course, not be the case with a possessive quantifier. Example:

 consider the string aaaabbbb

(?>a+)ab as a++ab will fail because the regex engine can't backtrack once the parenthesis of the atomic group closed.

but

(?>a+ab) will succeed because backtrack positions are always recorded inside the atomic group.

(?:a+|ab)+(?<!a)b will succeed, but (?>a+|ab)+(?<!a)b will fails because the parenthesis is closed between each repetitions.

Conclusion: the exact synonymous of (?>X+) is not X++ but (?:X+){1}+

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • +1 very well explained. Just curious why do we need to use: `{1}` here in `(?:X+){1}+` could we not use `(?:X+)+`? – anubhava May 04 '14 at 05:09
  • @anubhava: Thanks, no `{1}+` is only an artificial way to make atomic the non-capturing group (only for explanation purpose, it is totally useless, since there are atomic groups). `(?:X+)+` will repeat the non capturing group (that is not what I was looking for in the example) and is not atomic or possessive at all. – Casimir et Hippolyte May 04 '14 at 14:57