70

From the Java 6 Pattern documentation:

Special constructs (non-capturing)

(?:X)   X, as a non-capturing group

(?>X)   X, as an independent, non-capturing group

Between (?:X) and (?>X) what is the difference? What does the independent mean in this context?

user16320675
  • 135
  • 1
  • 3
  • 9
Peter Hart
  • 4,955
  • 2
  • 25
  • 30

4 Answers4

51

It means that the grouping is atomic, and it throws away backtracking information for a matched group. So, this expression is possessive; it won't back off even if doing so is the only way for the regex as a whole to succeed. It's "independent" in the sense that it doesn't cooperate, via backtracking, with other elements of the regex to ensure a match.

erickson
  • 265,237
  • 58
  • 395
  • 493
14

I think this tutorial explains what exactly "independent, non-capturing group" or "Atomic Grouping" is

The regular expression a(bc|b)c (capturing group) matches abcc and abc. The regex a(?>bc|b)c (atomic group) matches abcc but not abc.

When applied to abc, both regexes will match a to a, bc to bc, and then c will fail to match at the end of the string. Here their paths diverge. The regex with the capturing group has remembered a backtracking position for the alternation. The group will give up its match, b then matches b and c matches c. Match found!

The regex with the atomic group, however, exited from an atomic group after bc was matched. At that point, all backtracking positions for tokens inside the group are discarded. In this example, the alternation's option to try b at the second position in the string is discarded. As a result, when c fails, the regex engine has no alternatives left to try.

kajibu
  • 174
  • 1
  • 7
6

If you have foo(?>(co)*)co, that will never match. I'm sure there are practical examples of when this would be useful, try O'Reilly's book.

Vlad
  • 18,195
  • 4
  • 41
  • 71
-3

(?>X?) equals (?:X)?+, (?>X*) equals (?:X)*+, (?>X+) equals (?:X)++.

Taking away the fact that X must be a non-capturing group, the preceding equivalence is:

(?>X?) equals X?+, (?>X*) equals X*+, (?>X+) equals X++.

beibichunai
  • 130
  • 1
  • 10
  • The word `independent` in [the Pattern JavaDocs](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) is important. They aren't exactly the same, because `(?>X)` doesn't do any backtracking when a partial match fails, so some things that match using one will not match using the other. The [article @erickson linked to was helpful for me.](http://www.regular-expressions.info/atomic.html) – xdhmoore Aug 31 '17 at 17:26
  • Sorry, I'm not into this currently, so maybe my answer is not accurate. But from your own reference: "Most of these also support possessive quantifiers, which are essentially a notational convenience for atomic grouping." This is what I was trying to express. In the latter case, the additional '+' character means the possesive qualifiers. – beibichunai Dec 06 '17 at 21:18
  • `[?/*/+]` is the same as `[?+*/]` and is a [character class](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#classes) matching one of the 4 characters (`?`, `+`, `*`, `/`), and the `+` after the `]` is a [greedy quantifier](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greedy) that makes the character class match *one or more times*. There is no [possessive quantifier](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#poss) anywhere in your regex. – Andreas May 15 '20 at 19:52
  • (?>X?) equals (?:X)?+, (?>X*) equals (?:X)*+, (?>X+) equals (?:X)++. – beibichunai May 28 '20 at 11:08
  • I don't like this answer because it only works when `X` is itself an atom, and that isn't necessarily the case in [the documentation](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) -- sometimes X _must be an_ an atom (e.g.`X?+`) and sometimes it's not (e.g. `(X)`). Specifically, `(?>X?)` equals `X?+` _only_ if X is an atom: `(?>foo?)` is not equivalent to `(?:foo)?+`. – Jelaby May 25 '22 at 12:50
  • Jelaby, where did you read in the doc that X must be an atom in X?+ ? Could you give an example where X would not be an atom in X?+ ? – beibichunai May 26 '22 at 19:57
  • I see how negative votes get unanswered, as expected. ?+ Can be applied on anything, atom or not, as long as it's an expression. The same happens with (?> ). You won't apply the rule if X is not an expression: AB?+ is equivalent to A(?>B) not (?>AB?) - apply only on B - and it can't reversely applied on (?>AB?) because AB is not an expression there - B is but (> ) is not directly applied on the result - B? - but on a different expression AB?. Obviously you have to know precedence to apply a rule. The problem is not atomicity, is using the rule regardless of the expression structure. – beibichunai Jun 02 '22 at 04:29