I found this question about using capture groups with the \K
reset match (i.e., not sure if that's the correct name), but it does not answer my query.
Suppose I have the following string:
ab
With the following regex
a\Kb
the output is, as expected, b
:
However, when adding a capture group (i.e., $1
) using the regex
(a\Kb)
, group $1
returns ab
and not a
:
Given the following string:
ab
cd
Using the regex
(a\Kb)|(c\Kd)
I would hope group $1
to contain b
and group $2
to contain d
, but that is not the case as it can be seen below:
I tried Wiktor Stribiżew's answer that points to using a branch reset group:
(?|a\Kb)|(?|c\Kd)
Which produces:
However, now the matches are both part of group $0
, whereas I require them to be part of group $1
and $2
, respectively. Do you have any ideas on how this can be achieved? I am using Oniguruma regular expressions and the PCRE
flavor.
Update based on the comments below.
The example above was meant to be easy to understand and reproduce. @Booboo pointed out that a non-capturing group does the trick, i.e.,:
(?:a\K(b))|(?:c\K(d))
Produce the output:
However, when applied to another example it fails. Therefore, for clarity, I am extending this question to cover the more complicated scenario discussed in the comments.
Suppose I have the following text in a markdown
file:
- [x] Example task. | Task ends. [x] Another task.
- [x] ! Example task. | This ends. [x] ! Another task.
This is a sentence. [x] Task is here.
Other text. Another [x] ! Task is here.
| | Task name | Plan | Actual | File |
| :---- | :-------------| :---------: | :---------: | :------------: |
| [x] | Task example. | 08:00-08:45 | 08:00-09:00 | [[task-one]] |
| [x] ! | Task example. | 08:00-08:45 | 08:00-09:00 | [[task-one]] |
I am interested in a single regex
expression with two capture groups as follows:
group
$1
(i.e., see selection below):group
$2
(i.e., see selection below):
I have the following regex
(i.e., see demo here) that works when evaluated individually, but not when used inside a capture group:
- group
$1
:- outside the table:
[^\|\s]\s*\[x\]\s*\K[^!|\n]*
- inside the table:
(?:\G(?!\A)\||(?<=\[x]\s)\s*\|)\K[^|\n]*(?=\|)
- outside the table:
- group
$2
:- outside the table:
[^\|\s]\s*\[x\]\s*\!\s*\K[^|\n]*
- inside the table:
(?:\G(?!\A)\||(?<=\[x]\s)\s*\!\s*\|)\K[^|\n]*(?=\|)
- outside the table:
The problem I am experiencing is when combining the expressions above.
Pseudo regex
:
([x] outside|[x] inside)|([x] ! outside|[x] ! inside)
Actual regex
:
([^\|\s]\s*\[x\]\s*\K[^!|\n]*|(?:\G(?!\A)\||(?<=\[x]\s)\s*\|)\K[^|\n]*(?=\|))|([^\|\s]\s*\[x\]\s*\!\s*\K[^|\n]*|(?:\G(?!\A)\||(?<=\[x]\s)\s*\!\s*\|)\K[^|\n]*(?=\|))
Which produces (i.e., as in the demo linked above):
The regex
for the matches inside the table is based on Wiktor Stribiżew's answer and explained here.