If you use look-ahead to match either option first, then capture again in a second pass, you can get the match into a single capture.
The simplest approach uses other captures too:
(?=\[(.+?)\]|\((.+?)\))[(\[](\1\2)[)\]]
Works by: Matching either [...]
or (...)
as look-ahead, capturing the text between the delimiters into a capture 1 or 2. Then it captures the same text again, ignoring the delimiter, by backreferencing \1\2
, relying on back-reference to a non-participating match to match the empty string.
This way, the same string is captured into capture 3, which is always participating.
It's probably fairly efficient. The back-reference to the same position should match in no time.
If that's not good enough, and you want a RegExp with precisely one capture, which is the text between [..]
or (..)
, then I'd try look-behinds instead:
[(\[](.+?)(?:(?=\))(?<=\(\1)|(?=\])(?<=\[\1))
It matches a [
or (
, then tries to find a capture after it which, is followed by either )
or ]
, and then it does a backwards check to see if the leading delimiter was the matching (
or [
respectively.
Unlikely to be as efficient, but only matches (...)
and [...]
and captures what's between them in the single capture.
If the look-behind back-reference to the same position is efficient (not as guaranteed, but possible), it's potentially not bad.
If it's not efficient, it may do a lot of looking back (but only when seeing a possible end-)
or -]
).
It can also be converted to a RegExp which matches only the text you want, so "capture zero" is the result (as well as capture 1, which it uses internally), by matching the leading [
or (
with a look-behind:
(?<=[(\[])(.+?)(?:(?=\))(?<=\(\1)|(?=\])(?<=\[\1))
(Look-behinds, and -aheads, really is the gift that keeps giving when it comes to RegExp power. Both look-ahead and look-behind allows you to match the same sub-string more than once, using different RegExps, and even allows the later ones refer to captures from earlier matches.)