1

Suppose I have the following text:

Yes: [x]
Yes: [  x]
Yes: [x  ]
Yes: [  x  ]
No: [
No: ]

I am interested in capturing the angular brackets [ and ] containing an x with a variable amount of horizontal space on either side of the x. The bit I am struggling with is that both angular brackets must be captured into a group with the same ID (i.e., $1).

I started with a combination of positive lookahead and lookbehind assertions using the following regex:

\[(?=\h*x)|(?<=x)\h*\K\]

Which produces the following matches (i.e., see demo with the extended flag enabled for clarity):

Example first attempt

Then, I tried placing a capturing group around the whole expression, but the match extends to the horizontal space after the positive lookbehind (?<=x)\h* as shown below (i.e., also see demo).

Example second attempt

I am using Oniguruma regular expressions and the PCRE flavor. Do you have any ideas if and how this can be done?

Mihai
  • 2,807
  • 4
  • 28
  • 53
  • 1
    This is not possible in any regex flavor, as you cannot place disjoint streaks of text into a single capturing group. – Wiktor Stribiżew Dec 16 '21 at 09:25
  • @WiktorStribiżew, thanks, that explains a lot of things... – Mihai Dec 16 '21 at 09:30
  • 1
    You might use a branch reset group with an alternation `(?|(\[)(?=\h*x\h*])|(?<=\[)\h*x\h*(]))` https://regex101.com/r/R8OFr9/1 It would be the same group but a different match. – The fourth bird Dec 16 '21 at 09:32
  • @Thefourthbird, this is exactly what I have been looking for! Superb! Would it be possible to explain a bit the alternation approach in an answer? Then I can also accept it! – Mihai Dec 16 '21 at 09:35
  • 1
    So, "both angular brackets must be part of this capturing group" is wrong? Did you mean to write "both angular brackets must be captured into a group with the same ID"? – Wiktor Stribiżew Dec 16 '21 at 09:37
  • @WiktorStribiżew, yes you are right, that is what I meant. I apologized for the misuse of terminology. I will correct the mistake in the question. – Mihai Dec 16 '21 at 09:38
  • Thanks for pointing to the other question. Unfortunately, the title of that question is too specific and did not show on my radar while searching for a solution. – Mihai Dec 16 '21 at 09:45
  • 2
    That duplicate is too broad for a specific question like this. – The fourth bird Dec 16 '21 at 09:45

1 Answers1

2

You could make use of a branch reset group:

(?|(\[)(?=\h*x\h*])|(?<=\[)\h*x\h*(]))
  • (?| Branch reset group
    • (\[)(?=\h*x\h*]) Capture [ in group 1, asserting x between optional horizontal whitespace chars to the right followed by ]
    • | Or
    • (?<=\[)\h*x\h*(]) Assert [ to the left, then match x between optional horizontal whitespace and capture ] in group 2
  • ) Close branch reset group

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    I know it's not in line with the guidelines, but thank you, this is a powerful approach to be aware of. – Mihai Dec 16 '21 at 09:54