Recursive pattern in regex for latex commands

Question

I need to capture the text from the \textbf{} command, \textbf will have multiple nested braces like below

\textbf{adadasas}

\textbf{adadasas \textit{xxx} adasda {xxx}}

\textbf{adadasas {} {} {} dxxxx}

i want to capture the value inside the \textbf{...}

i tried with the regex in python {([^{}]*+(?:(?R)[^{}]*)*+)} (from: Recursive pattern in regex)

x = regex.findall(r'\\textbf{([^{}]*+(?:(?R)[^{}]*)*+)}',cnt)

i am not getting all the value. when removing the text \\textbf in the regex it is capture all the occurances.

Please suggest how to write a regex for the one

What is the expected result for each of the examples? – Tom McLean Jul 08 '22 at 10:24 — Tom McLean, Jul 08 '22 at 10:24

The fourth bird · Accepted Answer · 2022-07-08T10:54:32.240

You can repeat the first capture group (?1) instead of repeating the whole pattern with (?R) and capture what is inside the {} with group 2

\\textbf({([^{}]*+(?:(?1)[^{}]*)*+)})

\\textbf Match \textbf
( Capture group 1
- { Match a { char
- ( Capture group 2
  - [^{}]*+ Optionally match any char except { } with a possessive quantifier
    - (?: Non capture group to match as a whole
      - (?1)[^{}]* Recurse the first subroutine and optionally match any char except curly's
    - )*+ Close the non capture group and optionally repeat using a possessive quantifier
- ) Close group 2
- } Match a } char
) Close group 1

Regex demo

Note that if you use re.findall, you will get all values of the capture groups returned, and this pattern has 2 capture groups.

You can use re.finditer instead and get the group 2 value:

import regex

pattern = r"\\textbf({([^{}]*+(?:(?1)[^{}]*)*+)})"

cnt = ("\\textbf{adadasas}\n"
            "\\textbf{adadasas \\textit{xxx} adasda {xxx}}\n"
            "\\textbf{adadasas {} {} {} dxxxx}\n"
            "{adadasas {} {} {} dxxxx}")

matches = regex.finditer(pattern, cnt)

for _, match in enumerate(matches, start=1):
    print(match.group(2))

Output

adadasas
adadasas \textit{xxx} adasda {xxx}
adadasas {} {} {} dxxxx

OP needs to understand that an extra group around the curly braces in the pattern is required, it is not the first group in the original pattern that is recursed. — Wiktor Stribiżew, Jul 08 '22 at 10:42
@WiktorStribiżew Yes, I have added a breakdown to show where the groups are. — The fourth bird, Jul 08 '22 at 12:33

Recursive pattern in regex for latex commands

1 Answers1