4

All the examples I have found for the ?> construct could be coded with simpler constructs, and the explanation of better efficiency is confusing (at least to me). Does anyone have a practical use for this construct?

TonyR
  • 128
  • 1
  • 6
  • @Jan Are you saying a non-capturing group is atomic the same as an atomic group? – revo Mar 25 '18 at 08:24
  • Possible duplicate of [Regex lookahead, lookbehind and atomic groups](https://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups) – revo Mar 25 '18 at 08:26

3 Answers3

0

Observe the following regex

\b(integer|intrinsic|intractable|intergalactic)\b

It is equivalent to

\bint(eger|rinsic|ractable|ergalactic)\b

Which is in turn equivalent to

\bint(e(ger|rgalactic)|r(insic|actable))\b

However if this last regex matches the start of the string "integers" and fails due to the string boundary \b failure to match the regex will backtrack and try all the rest of the options of this small lexical tree, if you add atomic grouping

\bint(?>e(ger|rgalactic)|r(insic|actable))\b

You can use your knowledge that these possible matches are mutually exclusive (which they might very well be) to make the engine not backtrack once it matches one of the options to a t.

Veltzer Doron
  • 934
  • 2
  • 10
  • 31
  • Veltzer Thanks for your detailed explanation, but I am still missing something. For simplicity, when I write (integer|intrinsic), I have two mutually exclusive terms, if "integer" does not match, I have to backtrack to check whether "intrinsic" matches. What is the difference, or only that (?>integer|intrinsic) does not capture? – TonyR Mar 25 '18 at 16:19
  • if one of the atomic options matches it will not backtrack, the point in my explanation was the /b, if one of the lexical words matches but the /b doesn't you can tell the regex compiler not to backtrack (thus saving run time) since you know the options are mutually exclusive, this is a simple case with constants which could have been handled by the regex compiler but in general there could be wildcards and such knowledge could help the regex @ run-time. – Veltzer Doron Aug 14 '18 at 21:32
  • 1
    One practical example of an atomic group is the regex to match digits not followed by a period that I found in an recognized authorative regex book (at least in an early edition): \d+(?!\.) which should be written as (?>\d+)(?!\.) – TonyR Dec 05 '19 at 06:33
0

A better solution is the term \d+(?!.) that matches a number not followed by a period. This regex matches 12 but also 123. An atomic group can solve this problem (?>\d+)(?!.)

TonyR
  • 128
  • 1
  • 6
-1

A better example is ((?>books)|book)(s) The string "books" is matched with "book"+"s"

Note the location of ) and |

This is a contrived example (but better than the often propagated a(?>bc|b)c that, as far as I understand, has no purpose, becaúse only "abcc", but never "abc", matches), so I am still looking for a practical example of (?>...), or does nobody use this construct?

TonyR
  • 128
  • 1
  • 6