1

I am wondering what is the point of using non-capturing groups.

I tested a bunch of simple regexes on regex101.com and found that whether using a non-capturing group or not does not effect the number of steps it takes to capture the string.

For example,

fo(o)

takes 6 steps to match foo.

fo(?:o)

also takes 6 steps.

I know that using a non-capturing group means that there will be one fewer group in the match result, but so what? I can still count which group the text I want is in and get that group. The group number I need may be different, but I can still get the group either way.

For example:

f(o)+(.)\2+

is practically the same as

f(?:o)+(.)\1+

After reading this post, I learnt that with a non-capturing group, the regex engine does not "capture" the characters matched in a non-capturing group by putting the characters into an array. However, isn't that little bit of space to store the captured string negligible?

What is an example of a regex with capturing groups that is/behaves significantly different if non-capturing groups were used instead? Alternatively, what is an example of a case where non-capturing groups must be used?

P.S. I don't think the regex flavour matters here, since every main regex flavour's non-capturing and capturing groups behave the same (I think). If I'm wrong, answer in python will be fine.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • 1
    In Python, non-capturing groups might be very helpful when using with `re.split` and `re.findall`. Non-capturing groups only differ from capturing ones in one thing - they do not store the capture in the memory. That means, you cannot use a backreference if you have not declared a capturing group. That is all, I think. – Wiktor Stribiżew Dec 19 '17 at 13:23
  • 1
    It may be a relic from old regex tools where you could only have a set number of groups you could reference? sed for example only has 9 groups it can reference (10 if you count \0) – Rob Dec 19 '17 at 13:42
  • @Rob oh! I didn't know that! Thanks for the info. – Sweeper Dec 19 '17 at 13:56
  • 1
    As @Rob mentioned backreferences may be limited to a maximum number of capturing groups. [Many regex flavours support up to 99 capturing groups for backreferences](https://www.regular-expressions.info/backref.html). The article [Backreferences, part 2](https://www.regular-expressions.info/backref2.html) gives additional information. Also, take a look at [this SO question](https://stackoverflow.com/questions/33923192) that discusses PCRE limitations with capturing groups. PCRE apparently has a maximum of 65,635 capture groups. If you ever reach this, however, you're doing it wrong! – ctwheels Dec 19 '17 at 14:24
  • 1
    In most regex flavors OTHER than Python re, a non capturing group is the only way to get local flags. `(?i:BOB|ALLEN)` will make that capture case insensitive inside the group. There is no way to replicate that in a capture group. Python does not support local flags however.... If you wanted Bob to be case insensitive but Allen not and capture the result, `((?i:BOB)|Allen)` is the most logical way to do that in PCRE. – dawg Dec 19 '17 at 15:35

0 Answers0