7

What is the purpose of the passive group in a Javascript regex?

The passive group is prefaced by a question mark colon: (?:group)

In other words, these 2 things appear identical:

"hello world".match(/hello (?:world)/)
"hello world".match(/hello world/)

In what situations do you need the non capturing group and why?

Breck
  • 670
  • 2
  • 7
  • 22

6 Answers6

18

Two use cases for capturing groups

A capturing group in a regex has actually two distinct goals (as the name "capturing group" itself suggests):

  1. Grouping — if you need a group to be a treated as a single entity in order to apply some stuff to the whole group.

    Probably the most trivial example is including an optional sequence of characters, e.g. "foo" optionally followed by "bar", in regex terms: /foo(bar)?/ (capturing group) or /foo(?:bar)?/ (non-capturing group). Note that the trailing ? is applied to the whole group (bar) (which consists of a simple character sequence bar in this case). In case you just want to check if the input matches your regex, it really doesn't matter whether you use a capturing or a non-capturing group — they act the same (except that a non-capturing group is slightly faster).

  2. Capturing — if you need to extract a part of the input.

    For example, you want to get number of rabbits from an input like "The farm contains 8 cows and 89 rabbits" (not very good English, I know). The regex could be /(\d+)\s*rabbits\b/. On successful match, you can get the value matched by the capturing group from JavaScript code (or any other programming language).

    In this example, you have a single capturing group, so you access it via its index 0 (see this answer for details).

    Now imagine you want to ensure that the "place" is called "farm" or "ranch". If it's not the case, then you don't want to extract the number of rabbits (in regex terms — you don't want the regex to match).

    So you rewrite your regex as follows: /(farm|ranch).*\b(\d+)\s*rabbits\b/. The regex works by itself, but your JavaScript is broken — there are two capturing groups now and you must change your code to get the contents of the second capturing group for the number of rabbits (i.e. change index from 0 to 1). The first group now contains the string "farm" or "ranch", which you didn't intend to extract.

    A non-capturing group comes to rescue: /(?:farm|ranch).*\b(\d+)\s*rabbits\b/. It still matches either "farm" or "ranch", but doesn't capture it, thus not shifting the indexes of subsequent capturing groups. And your JavaScript code works fine without changing.


The example may be oversimplified, but consider that you have a very complex regex with many groups, and you want to capture only few of them. Non-capturing groups are really helpful then — you don't have to count all of your groups (only capturing ones).

Besides, non-capturing groups serve documentation purposes: for someone who reads you code, a non-capturing group is an indication that you are not interested in extracting contents, you just want to ensure that it matches.


A few words on separation of concerns

Capturing groups are a typical example of breaking the SoC principle. This syntax construct serves two distinct purposes. As the problems herewith became evident, an additional construct (?:) was introduced to disable one of the two features.

It was just a design mistake. Maybe a lack of "free special characters" played its role... but it was still a poor design.

Regex is a very old, powerful and widely used concept. For the reasons of backwards compatibility, this flaw is now unlikely to be fixed. It's just a lesson of how important the separation of concerns is.

Community
  • 1
  • 1
Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103
11

Non-capturing have just one difference from "normal" (capturing) groups: they don't require the regex engine to remember what they matched.

The use case is that sometimes you must (or should) use a group not because you are interested in what it captures but for syntactic reasons. In these situations it makes sense to use a non-capturing group instead of a "standard" capturing one because it is less resource intensive -- but if you don't care about that, a capturing group will behave in the exact same manner.

Your specific example does not make a good case for using non-capturing groups exactly because the two expressions are identical. A better example might be:

input.match(/hello (?:world|there)/)
ibrahim mahrir
  • 31,174
  • 5
  • 48
  • 73
Jon
  • 428,835
  • 81
  • 738
  • 806
  • 3
    Technically, you don't ever *need* a regexp in the first place, much like any abstraction layer or method of "code" reuse. I'm not sure the pedanticism is a useful thing to lead with. – millimoose Sep 02 '13 at 18:12
  • 1
    @millimoose: Thanks for the feedback, but I don't see that as pure pedanticism. One can be writing regexes successfully for years while completely ignoring the existence of non-capturing groups, so IMHO they don't quite fall into the same category as regexes themselves or other things like quantifiers etc. "When do I need to use `+`" would not have warranted this response. – Jon Sep 02 '13 at 18:17
  • 1
    I'm not saying it's not a valid point, just that it's hardly *the* most important thing to know about non-capturing groups. Seems more suitable as an aside or coda than as the very first sentence. (I admit right now I'm the one being highly pedantic.) – millimoose Sep 02 '13 at 19:29
  • 1
    @millimoose: That's a fair point, I edited the answer and hopefully the current version will look better to you as it also does to me. Thank you again for the feedback and well done for the smooth self-critical comment at the end; it came off as humorous and left a pleasant conversational aftertaste. – Jon Sep 02 '13 at 20:06
5

In addition to the answers above, if you're using String.prototype.split() and you use a capturing group, the output array contains the captured results (see MDN). If you use a non-capturing group that doesn't happen.

var myString = 'Hello 1 word. Sentence number 2.';
var splits = myString.split(/(\d)/);

console.log(splits);

Outputs:

["Hello ", "1", " word. Sentence number ", "2", "."]

Whereas swapping /(\d)/ for /(?:\d)/ results in:

["Hello ", " word. Sentence number ", "."]
Ben Creasy
  • 3,825
  • 4
  • 40
  • 50
3

When you want to apply modifiers to the group.

/hello (?:world)?/
/hello (?:world)*/
/hello (?:world)+/
/hello (?:world){3,6}/

etc.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
3

Use them when you need a conditional and don't care about which of the choices cause the match.

Non-capturing groups can simplify the result of matching a complex expression. Here, the group 1 is always the name speaker. Without the non-capturing group, the speaker's name may end up in group 1 or group 2.

/hello (?:world|foobar )?said (.+)/

quietmint
  • 13,885
  • 6
  • 48
  • 73
0

I have just found a different use for it. I was trying to capture a nested group but wanted the whole collection of the repeating group as one element:

So for AbbbbC

(A)((?:b)*)(C)

gives three groups A bbbb C

for AC also gives three groups A null C

hum3
  • 1,563
  • 1
  • 14
  • 21