4

I'm aware of many of the great answer about atomic grouping, e.g., Confusion with Atomic Grouping - how it differs from the Grouping in regular expression of Ruby?

My question is simple: so alternation in atomic grouping is useless, right?

Some examples:

  • a(?>bc|b)c will never match abc, actually it will never try b part in the ()
  • (?>.*|b*)[ac] will never match any string since .* matches them all and is discarded.

Do I understand it right?


Some test code in perl just in case it might be helpful

sub tester {
    my ($txt, $pat) = @_;
    if ($txt =~ $pat) {
        print "${pat} matches ${txt}\n";
    } else {
        print "No match\n";
    }
}

$txt = "abcc";
$pat = qr/a(?>bc|b)c/;
tester($txt, $pat);

$txt = "bbabbbabbbbc";
$pat = qr/(?>.*)c/;
tester($txt, $pat);

$pat = qr/(?>.*|b*)[ac]/;
tester($txt, $pat);

$txt = "abadcc";
$pat = qr/a(?>b|dc)c/;
tester($txt, $pat);

I found an explanation in here that kinda answers my question.

It (atomic grouping) tells the RegEx engine that once it has found a matching subpattern not to backtrack on any of the quantifiers or alternative that may be inside of it.

Community
  • 1
  • 1
gongzhitaao
  • 6,566
  • 3
  • 36
  • 44

1 Answers1

6

It's not useless in the general case.

It doesn't work in your examples because the first choice always matches, therefore the second choice won't be tried. And the engine won't backtrack inside the atomic group, because this is pretty much the purpose of an atomic group.

If you have two disjoint patterns in an atomic group, that will make perfect sense
(something like (?>ab|cd)).

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • So `trying alternatives` and `backtracking` are two different concepts or processes? – gongzhitaao Apr 04 '16 at 13:59
  • Backtracking is what happens when the current state doesn't allow the engine to match further. In such cases, it needs to try another solution (for instance, it could try a different choice in an alternation). Atomic groups are treated like a single entity, which either matches or doesn't match at a given position. Once it matched, it can only match in a single unique way. – Lucas Trzesniewski Apr 04 '16 at 14:02
  • So actually all the alternatives in the regex will be tried but previous matched strings in the text will not be matched again (i.e., no backtracking). Is this somewhere closer? – gongzhitaao Apr 04 '16 at 14:08
  • I updated an example to illustrate my understanding. – gongzhitaao Apr 04 '16 at 14:12
  • 1
    Just to precise the term "disjoint": it means subpatterns that cannot match the same text at the same position. – Wiktor Stribiżew Apr 05 '16 at 10:14