161

I'm using rubular.com to build my regex, and their documentation describes the following:

(...)   Capture everything enclosed
(a|b)   a or b

How can I use an OR expression without capturing what's in it? For example, say I want to capture either "ac" or "bc". I can't use the regex

(a|b)(c)

right? Since then I capture either "a" or "b" in one group and "c" in another, not the same. I know I can filter through the captured results, but that seems like more work...

Am I missing something obvious? I'm using this in Java, if that is pertinent.

user_
  • 104
  • 1
  • 1
  • 11
goggin13
  • 7,876
  • 7
  • 29
  • 44

4 Answers4

263

Depending on the regular expression implementation you can use so called non-capturing groups with the syntax (?:…):

((?:a|b)c)

Here (?:a|b) is a group but you cannot reference its match. So you can only reference the match of ((?:a|b)c) that is either ac or bc.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 4
    I thought the idea was not to capture the `a` or `b` at all. In other words, to *match* `ac` or `bc`, but only *capture* the `c`: `(?:a|b)(c)` – Alan Moore Jul 31 '10 at 21:16
  • 1
    @AlanMoore Is it possible to capture one and not the other in the or statement? So I'm looking for the pattern `ac` or `ab`, but I want to output `ab` if `ab` and only 'c' is output is 'ac'. – Moondra Aug 03 '17 at 21:12
36

If your implementation has it, then you can use non-capturing parentheses:

(?:a|b)
Marc Mutz - mmutz
  • 24,485
  • 12
  • 80
  • 90
9

If your OR alternatives are all single characters - you can just use "character set" operator:

([ab]c)

it will only match ac or bc and it's more readable.

yrtimiD
  • 535
  • 7
  • 7
3

Even rubular doesn't make you use parentheses and the precedence of | is low. For example a|bc does not match ccc

msw
  • 42,753
  • 9
  • 87
  • 112
  • what does the '!~' operator do? I like your expression, with fewer parens, regex is messy enough already – goggin13 Jul 31 '10 at 16:09
  • !~ is a perlism for "does not match", it was sloppy writing on my part; fixed, thanks. – msw Jul 31 '10 at 16:15
  • 5
    I don't get you. The low precedence of `|` is why you *do* have to use parens. `(?:a|b)c` matches `ac` or `bc` (the desired behavior), while `a|bc` matches `a` or `bc`. – Alan Moore Jul 31 '10 at 21:29