6

I've been reading some articles on non-capturing groups on this site and on the net (such as http://www.regular-expressions.info/brackets.html and http://www.asiteaboutnothing.net/regexp/regex-disambiguation.html, What does the "?:^" regular expression mean?, What is a non-capturing group? What does a question mark followed by a colon (?:) mean?)

I am clear on the meaning of (?:foo). What I am unclear about is (?=foo). Is (?=foo) also always a non-capturing group, or does it depend?

Community
  • 1
  • 1
Jon Lyles
  • 345
  • 1
  • 3
  • 12

3 Answers3

11

No, (?=foo) will not capture "foo". Any look-around assertion (negative- and positive look ahead & behind) will not capture, but only check the presence (or absence) of text.

For example, the regex:

(X(?=\d+))

matches "X" only when there's one or more digits after it. However, these digits are not a part of match group 1.

You can define captures inside the look ahead to capture it. For example, the regex:

(X(?=(\d+)))

matches "X" only when there's one or more digits after it. And these digits are captured in match group 2.

A PHP demo:

<?php
$s = 'X123';
preg_match_all('/(X(?=(\d+)))/', $s, $matches);
print_r($matches);
?>

will print:

Array
(
    [0] => Array
        (
            [0] => X
        )

    [1] => Array
        (
            [0] => X
        )

    [2] => Array
        (
            [0] => 123
        )

)
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • +1 Thanks. If I understand your example, it's a kind of conditional, for example the regex (X(?=\d+)) translated in English means "match X if that X is followed by a digit" – Jon Lyles Jul 11 '12 at 15:12
7

Lookarounds are always non-capturing and zero-width.

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • +1 and Thank you. Zero-width is a term that confused me. Does it refer to the fact that it doesn't consume any characters. Quote from: http://www.asiteaboutnothing.net/regexp/regex-disambiguation.html "a lookahead or a lookbehind does not "consume" any characters on the string. This means that after looking, the regex engine is back on the same spot on the string from where it started looking." – Jon Lyles Jul 11 '12 at 15:06
  • 1
    Yes, exactly; likewise, ``\b`` is known as a zero-width character. I couldn't find a source definitively saying that lookarounds can't capture, but consider the contrary: if lookarounds _were_ capturable, then by what syntax would one make it _non_-capturable? (There is no syntax such as ``(?:?=...)``.) If one couldn't force a lookaround to be non-capturable, then many, many regexes using lookarounds would break, since the capture offsets would shift with each lookaround. I believe this is evidence enough to suggest that no engine will ever capture lookarounds by default. – Andrew Cheong Jul 11 '12 at 15:36
2

Every group starting with ? will be non-capturing, although only (?:foo) works as a regular group.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • What you usually do with `(foo)`. You can add quantifiers to repeat it, for example. I'm not really sure that's meaningful with a lookaround (`(?<=...)`, `(?=...)`, `(?!...)`, `(?<!...)`), regex options (`(?i)`, ...), etc. – Joey Jul 11 '12 at 15:27