Need a regex expert to match nested brackets

Question

A string :

Démontrer par récurrence que pour tout entier naturel n,
\(\displaystyle{1^2+2^2+\ldots+n^2=\sum_{k=0\dfrac dsdq ds}rr^{n}k^2=\dfrac{n(n+1)(2n+1)}{6}}\)Démontrer par récurrence que pour tout entier naturel n,\(\displaystyle{1^2+2^2+\ldots+n^2=\sum_{k=0\dfrac{test} fdfd}^{n}k^2=\dfrac{n(n+1)(2n+1)}{6}}\)

I need to replace \dfrac with \frac in _{...} or ^{...}

I tried a lot of patterns (in vain) like :

/(_|\^)\{(.*[^{}])(\\dfrac)(.*[^{}])}/gU

No expert can help you if you are using a regex engine that does not support recursion or balancing groups. TL;DR: what tool/language are you using? — HamZa, Apr 13 '15 at 12:03
I suggest to use the classical way to match nested brackets https://regex101.com/r/tO9fY2/1 with a callback `preg_replace_callback()` https://eval.in/private/ee0ac1d7c07cad — HamZa, Apr 13 '15 at 12:39

Casimir et Hippolyte · Accepted Answer · 2015-04-13T12:48:14.000

You need to use preg_replace_callback, with a pattern able to extract content between _{ and } or ^{ and } with nested curly brackets, and a callback function that will replace all occurrences of \dfrac in the match. Example:

$pattern = '~[_^]({[^{}]*(?:(?1)[^{}]*)*})~';

$result = preg_replace_callback($pattern,
    function ($m) { return str_replace('\dfrac', '\frac', $m[0]); },
    $text);

pattern details:

~              # pattern delimiter
[_^]           # _ or ^
(              # open the capture group 1
    {
    [^{}]*     # all that is not a curly bracket
    (?:        # open a non capturing group
        (?1)   # the recursion is here:
               # (?1) refers to the subpattern contained in capture group 1
               # (so the current capture group)
        [^{}]* # 
    )*         # repeat the non capturing group as needed
    }
)              # close the capture group 1
~

Note: if curly brackets are not always balanced, you can change quantifiers to possessive to prevent too much backtracking and to make the pattern fail faster:

$pattern = '~[_^]({[^{}]*+(?:(?1)[^{}]*)*+})~';

or you can use an atomic group as well (or better):

$pattern = '~[_^]({(?>[^{}]*(?:(?1)[^{}]*)*)})~';

Is there any possible reason to hide the explanation instead of inlining the comments with `/x` mode? Compressed line-noise style regexes are not maintainable. — tchrist, Apr 13 '15 at 13:11
@tchrist: I don't use the free-spacing/verbose/etc. mode for small patterns. Furthermore, there is no need to maintain this because, if you know the concept behind this kind of pattern, you are able to rewrite it from scratch to fit other needs, in the other case whatever the presentation, you can't do anything. — Casimir et Hippolyte, Apr 13 '15 at 13:57

Wiktor Stribiżew · Answer 2 · 2015-04-13T13:24:22.287

2

You can try this regex:

(?(DEFINE)                            # Definitions
(?<needle>\\dfrac(?=[^\}]*\}))    # What to search for
(?<skip>^[^\{]*\{|\}[^\{]*\{)               # What we should skip
)
(?&skip)(*SKIP)(*FAIL)                # Skip it
|
(?&needle)                            # Match it

See demo.

PHP code:

$re = "/(?(DEFINE)                            # Definitions
        (?<needle>\\\\dfrac(?=[^\\}]*\\}))    # What to search for
        (?<skip>^[^\\{]*\\{|\\}[^\\{]*\\{)               # What we should skip
        )
        (?&skip)(*SKIP)(*FAIL)                # Skip it
        |
        (?&needle)                            # Match it/xm"; 
$str = "Démontrer par récurrence que pour tout entier naturel n,\n\dfrac\n\(\displaystyle{1^2+2^2+\ldots+n^2=\sum_{k=0\dfrac dsdq ds}rr^{n}k^2=\dfrac{n(n+1)(2n+1)}{6}}\)\nDémontrer par récurrence que pour tout entier naturel n,\n\n\(\displaystyle{1^2+2^2+\ldots+n^2=\sum_{k=0\dfrac{test} fdfd}^{n}k^2=\dfrac{n(n+1)(2n+1)}{6}}\)\n\n\dfrac"; 
$subst = "\\frac"; 
$result = preg_replace($re, $subst, $str);

edited Apr 13 '15 at 13:24

answered Apr 13 '15 at 12:35

Wiktor Stribiżew

607,720
39
448
563

One fail: https://regex101.com/r/rE0nN5/3 . IMHO the skip/fail trick isn't really useful in this particular case – HamZa Apr 13 '15 at 12:51
@HamZa: It works now :) What have you been thinking of? Please post. – Wiktor Stribiżew Apr 13 '15 at 12:59
1

Thinking about your solution? It seems a bit clumsy/cryptic. Well it's regex after all lol. I'm still not convinced about this solution. It works though. Just be careful when you use `^`, you probably want to use the `m` modifier and prevent the negative character classes from matching newlines in some contexts. +1 for now – HamZa Apr 13 '15 at 13:09
1

@HamZa: Thank you for your comments. I see your point, and added the `m` inline option to enforce multiline behavior. – Wiktor Stribiżew Apr 13 '15 at 13:25

Need a regex expert to match nested brackets

2 Answers2

Linked