116

I have the following Java regex, which I didn't write and I am trying to modify:

^class-map(?:(\\s+match-all)|(\\s+match-any))?(\\s+[\\x21-\\x7e]{1,40})$
           ^                                 ^

It's similar to this one.

Note the first question mark. Does it mean that the group is optional? There is already a question mark after the corresponding ). Does the colon have a special meaning in regex?

The regex compiles fine, and there are already JUnit tests that show how it works. It's just that I'm a bit confused about why the first question mark and colon are there.

Jun
  • 2,942
  • 5
  • 28
  • 50
BJ Dela Cruz
  • 5,194
  • 13
  • 51
  • 84
  • 4
    `The question mark and the colon after the opening round bracket are the special syntax that you can use to tell the regex engine that this pair of brackets should not create a backreference`. http://www.regular-expressions.info/brackets.html – cklab Jul 17 '12 at 21:07

2 Answers2

166

(?: starts a non-capturing group. It's no different to ( unless you're retrieving groups from the regex after use. See What is a non-capturing group? What does a question mark followed by a colon (?:) mean?.

Community
  • 1
  • 1
ryanp
  • 4,905
  • 1
  • 30
  • 39
  • 1
    I am going to guess this is more efficient too, since it does not need to hold the groups in memory for backreference use... – tmn Apr 29 '15 at 02:56
  • 9
    A little more information: the `?` following `)` is unrelated to `(?:`. The second `?` signifies that the non-capturing group is optional. – ggentzke Nov 17 '15 at 20:52
  • 1
    Thomas N: Yes, a little more efficient. But so little it's not likely to matter. If efficiency were a consideration, would be better off coding the operation by hand instead of using regex's, rather than accept the efficiency gain of capturing vs. non capturing groups. IMO, the decision of whether to use capturing vs. non capturing should simply document the intent of the expression. – Tongfa Apr 13 '17 at 16:32
53

A little late to this thread - just to build on ryanp's answer.

Assuming you have the string aaabbbccc

Regular Expression

(a)+(b)+(c)+

This would give you the following 3 groups that matched:

['a', 'b', 'c']

Regular Expression with non-capturing parenthesis

Use the ?: in the first group

(?:a)+(b)+(c)+

and you would get the following groups that matched:

['b', 'c']

Hence why it is called "non-capturing parenthesis"

Example use case:

Sometime you use parenthesis for other things. For example to set the bounds of the | or operator:

"New (York|Jersey)"

In this case, you are only using the parenthesis for the or | switch, and you don't really want to capture this data. Use the non-capturing parenthesis to indicate that:

"New (?:York|Jersey)"
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159