0

This is my text:

This [is] some [d[um]my] text. How to [se[le]ct i[nn]er b]race wi[th[out s]ele[ct]ing th]e outer b[race [in] a tex]t

The regex for the above text which is required must be highlighted as below

This [is] s[o]me [d[um]my] text. How to [se[le]ct i[nn]er b]race wi[th[out s]ele[ct]ing th]e outer b[race [in] a tex]t

As you can see, the regex must highlight only the braces which has parent braces. The braces which do not have parent braces must not be selected.

For example [is] and s[o]me has no parent brace hence they must not be highlighted. But [d[um]my], [se[le]ct i[nn]er b]race have parent brace hence the brackets along with the text inside must be selected.

I have tried the below PCRE regex:

\[[^\[]+?]

https://regex101.com/r/xR0wM3/12

But it is also highlighting the braces which do not have outer brace. That is the only issue to be solved, all the other highlighting of text is working perfectly. In the example provided the change must be made such that it must not select the braces which do not have parent brace. i.e, in the example [is] is being selecting which is out of scope of the requirement. If this is solved then my requirement is completed.

klenium
  • 2,468
  • 2
  • 24
  • 47
  • You mean PHP, right? Please only keep the relevant tags. – Wiktor Stribiżew Aug 17 '15 at 09:59
  • Quote the regex you've tried **in** your question, not just linked. Links rot, and people shouldn't have to go off-site to see what you've tried. (Not that a regex101 link isn't a useful *addition*.) *Edit*: I've done it for you this time. – T.J. Crowder Aug 17 '15 at 09:59
  • 4
    You've tagged `javascript` and `php`. You need to choose, their regular expression engines are similar, but different, and one of the differences (PHP has lookbehind, JavaScript doesn't) may well be relevant. Please also don't tag wildly: `performance` is clearly not a relevant tag, nor is `html`. (I've removed them.) – T.J. Crowder Aug 17 '15 at 10:00
  • it will select only inner braces – Raghavendra Aug 17 '15 at 10:02
  • I would do it in two steps: match parent braces then match inner braces. Now the question is: are you using PHP or JS? If you're using PHP you might use a recursive pattern otherwise you will need to write a parser in JS. In either cases you could write a parser. Have fun. – HamZa Aug 17 '15 at 10:08
  • @NarendraSisodia that will not help – Raghavendra Aug 17 '15 at 10:12

1 Answers1

3

Keeping in mind that

There will be only one parent brace i.e only one nested level.

You can use the following regex in PHP:

(?:\[|(?!^)\G).*?(\[[^\[\]]*\])

See demo

The (?:\[|(?!^)\G) part will make sure we only match [...] that are inside another pair of [...].

A bit more optimized variant without capture group and using \K (that omits the whole initial part of the match):

(?:\[|(?!^)\G)[^\[\]]*\K\[[^\[\]]*\]

See demo 2

An approach for JavaScript includes 2 steps:

  • We extract those substrings with parent parentheses with var re = /[^\[]+(\[(?:[^\[\]]|\[[^\[\]]*\])*\])/g;
  • Then, we extract all inner [...] substrings from those chunks with rx = /\[[^\[\]]+\](?=(?:[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*\]))/g;.

var re = /[^\[]+(\[(?:[^\[\]]|\[[^\[\]]*\])*\])/g; 
var str = 'This [is] some [d[um]my] text. How to [se[le]ct i[nn]er b]race wi[th[out s]ele[ct]ing th]e outer b[race [in] a tex]t';
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    rx = /\[[^\[\]]+\](?=(?:[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*\]))/g;
    var n;
    while ((n = rx.exec(m[1])) !== null) {
          if (n.index === rx.lastIndex) {
             rx.lastIndex++;
        }
        document.getElementById("r").innerHTML += n[0]+"<br/>";
    }
}
<div id="r"/>

A couple of words about the second regex: (?=(?:[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*\])) look-ahead is making sure there are either characters other than [ and ] ([^\[\]]*), or [...] substrings (\[[^\[\]]*\]) and then a closing ] should follow. It could be written as (?=(?:[^\[\]]|\[[^\[\]]*\])*\]), but an unwrapped version I am using is more efficient (though looks very untidy. This is JS, sorry.)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563