4

I was looking at this answer to this question: Regex nested parentheses, and was thinking that instead of a quantified atomic group (?> list | of | alternates )* it should have been an atomic quantified group (?> (?: list | of | alternates )* ). Am I wrong? Are they the same or different in the world of regex? Especially in terms of the .NET implementation?

I personally would think them different, and I usually use perl regexes which would translate to (?: list | of | alternates )*+. This to me is much clearer anyway, stating that I want to backtrack before this particular regex if needed (an atomic quantified group). However, perhaps this is something that was implemented as a design decision where the train of thought was that quantified atomic group is not useful?

Adrian
  • 10,246
  • 4
  • 44
  • 110

1 Answers1

3

When an atomic group is called as an independent expression,
backtracking does take place inside of it, just like anywhere else.

The difference is that an atomic group cannot control the backtracking
mechanism externally.

So, each quantified pass of an atomic group only counts for a single
instance that will not cause backtracking.

However, if you put the quantifier on a cluster group inside a non-quantified
atomic group, the affect is that the entire contents will not influence
external backtracking.

It's the granularity that is important.

Example

(?>a|b|c)*abc will match aaaaaabbbbbbbbbbbabc

where as

(?>(?:a|b|c)*)abc will not match aaaaaabbbbbbbbbbbabc
because the (?:a|b|c)* clause consumes it all, leaving no room for it to
find abc.

A good rule of thumb is:

If a quantifier is external to an atomic group, it can control backtracking
externally.

If a quantifier is internal to an atomic group, it can control backtracking
internally only.

And, when you quantify an atomic group, on each pass, the flow exits the
group, which makes that pass' results (as a whole) eligible to be backtracked.

  • Good explanation with example – Gurmanjot Singh Jan 19 '18 at 02:20
  • This is what I thought. So am I correct that perl's `(?: list | of | alternates )*+` is equivalent to `(?> (?: list | of | alternates )* )` or am I wrong? This would also mean that the example [here](https://stackoverflow.com/a/19597456/1366368) would benefit by having a atomic quantified group for the parentheses matching, as it wouldn't backtrack into an already matched parenthesised group (as well as having an atomic group outside of the 1st level parenthesis). – Adrian Jan 19 '18 at 15:43
  • Confirmed that `(?: list | of | alternates )*+` is equivalent to `(?> (?: list | of | alternates )* )`. Thanks for the examples. Helped a lot. – Adrian Jan 19 '18 at 15:53
  • @Adrian - It makes sense since `(?:..)*+` the possessive modifier acts on the result of quantified group. –  Jan 20 '18 at 20:45