0

I have the following string :

bla {{bla  {{bla bla {{afsaasg}} }} blabla}} {{bla bla}} bla

I would like to match

{{bla  {{bla bla {{afsaasg}} }} blabla}}

with a regex.

but my regex

{{(.*?)}}

matches

{{bla  {{bla bla}}

anyone can help ?

Additional Info : I expect to have not more then 2 brackets at the same time.

Finally I solved this with an own Java fuction. Perhabs this will help someone :

public static ArrayList<String> getRecursivePattern(String sText, String sBegin, String sEnd) {

        ArrayList<String> alReturn = new ArrayList<String>();

        boolean ok1 = true;
        boolean ok2 = true;

        int iStartCount = 0;
        int iEndCount = 0;

        int iStartSearching = 0;

        while (ok1) {
            int iAnfang = sText.indexOf(sBegin, iStartSearching);

            ok2 = true;
            if (iAnfang > -1) {
                while (ok2) {

                    int iStartCharacter = sText.indexOf(sBegin, iStartSearching);
                    int iEndCharacter = sText.indexOf(sEnd, iStartSearching);

                    if (iEndCharacter == -1) {
                        // Nothing found . stop
                        ok2 = false;
                        ok1 = false;

                    } else if (iStartCharacter < iEndCharacter && iStartCharacter != -1) {
                        // found startpattern
                        iStartCount = iStartCount + 1;
                        iStartSearching = iStartCharacter + sBegin.length();
                    } else if (iStartCharacter > iEndCharacter && iEndCharacter != -1 || (iStartCharacter == -1 && iEndCharacter != -1)) {
                        iEndCount = iEndCount + 1;
                        iStartSearching = iEndCharacter + sEnd.length();

                    } else {
                        if (iStartCharacter < 0) {
                            // No End found . stop
                            ok2 = false;
                        }
                    }
                    if (iEndCount == iStartCount) {
                        // found the pattern
                        ok2 = false;
                        // cut
                        int iEnde = iStartSearching;// +sEnd.length();
                        String sReturn = sText.substring(iAnfang, iEnde);
                        alReturn.add(sReturn);
                    }
                }
            } else {
                ok1 = false;
            }
        }

        return alReturn;
    }

I call it:

    ArrayList<String> alTest=getRecursivePattern("This {{ is a {{Test}} bla }}","{{","}}");
    System.out.println(" sTest : " + alTest.get(0));
mcfly soft
  • 11,289
  • 26
  • 98
  • 202
  • What language are you using for this? – anubhava Jan 10 '15 at 10:36
  • Do you want the regex to match _only_ nested structures? – Aran-Fey Jan 10 '15 at 10:42
  • 1
    Regex is not powerful enough to solve this in general case (although you can construct increasingly ugly expressions to parse it to any specific level of nesting). – Sergey Kalinichenko Jan 10 '15 at 10:45
  • 1
    @Avinash : Thanks a lot. {{(?:[^{}]|(?R))*}} is exactly what is working . Can you answer and I will accept? – mcfly soft Jan 10 '15 at 10:47
  • see here for the regex explanation http://www.regular-expressions.info/recurse.html – Avinash Raj Jan 10 '15 at 10:51
  • @AvinashRaj: Well this will parse also `{{{foo}}`: it matches anything that has `{{` before `}}`. (talking about `{{.*}}`). – Willem Van Onsem Jan 10 '15 at 11:01
  • @CommuSoft at first, i suggested the above. Later he edited the question. So i suggested this `{{(?:[^{}]|(?R))*}}` – Avinash Raj Jan 10 '15 at 11:08
  • 1
    @AvinashRaj: But that's only supported in the P-languages (Perl, PHP, Ruby,...) that even support (by (negative) lookahead), *Turing complete* regexes, as far as I know the Java regex engine does not support these. And furthermore it's not really advisable to do so... – Willem Van Onsem Jan 10 '15 at 11:10
  • 2
    @user1344545: Recursive sub-pattern in Java? I believe that is only available in PCRE – anubhava Jan 10 '15 at 11:12
  • 1
    @CommuSoft: Finally I managed to test this in JAVA and you are right. In Java this is not supported. I did my own function in Java and added this to the question. Perhabs it can help somebody else. Thank you very much, this was new and interresting to me. – mcfly soft Jan 12 '15 at 12:29

3 Answers3

1

.NET has special support for nested item matching, so {{(?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}} would do what you wanted in C# to any level of nesting, but not Java.

user1906580
  • 249
  • 2
  • 4
0

You can't do this with regular expressions. It the consequence of the pumping lemma. You need to use context-free grammar's, or perhaps use dedicated tools (like XML/DOM/... parsers).

You can indeed parse this for - say - three levels deep, but you can't let this work for an arbitrary number of levels. Even then, it's better to use context-free grammars (like a LALR compiler compiler), simply because "These are the tools designed to parse such structures.".

In other words, If one day, someone can enter {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ bla }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}, and this is supposed to be valid, it will most likely fail.

One sidenote:

Say the level is for instance i levels deep, you can use a regex like:

  • for 1: .*?(.*?\{\{.*?\}\}.*?)*.*?
  • for 2: .*?(.*?\{\{.*?(.*?\{\{.*?\}\}.*?)*.*?\}\}.*?)*.*?
  • ...

But as you can see, the more deep you go, the longer the regex, and there is no way to parse them for arbitrary depth.

See also this discussion for people who want to parse XML/HTML - another recursive language - with regexes.

As you noted, some regular expression toolkits indeed provide tools to count things. These can be found in the P-languages (PHP, Perl,...). These aren't regular expressions (as defined by Kleene, see this Wikipedia-article about what a real regex is) strictly speaking, but simplified parsers. Because they don't describe a regular language. And - currently - not available in most regex libraries including Java. Some of the libraries even provide Turing complete parsers, parsers than can parse anything you can parse algorithmically, but it's not really recommended for advanced tasks...

Community
  • 1
  • 1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Noone told me this thousend times :-) and the anwer from Avinash works. I guess we both can learn a lot here. – mcfly soft Jan 10 '15 at 10:48
  • @user1344545: As argued before, it works if the maximum depth **is determined in advance**. The answer also shows how to construct such a regex for arbitrary depth. But it is mathematically *proven* you can't do this when the depth is arbitrary. – Willem Van Onsem Jan 10 '15 at 10:52
  • Thats wrong try {{(?:[^{}]|(?R))*}} in https://www.regex101.com/ with the text of bla {{bla {{bla bla {{ agagsd {{ sdgasdgasgd {{ sgsagdasd }} }} }} {{afsaasg}} }} blabla}} {{bla bla}} bla. You will see it works recursively. I am lucky. – mcfly soft Jan 10 '15 at 10:55
  • @user1344545: (1) that's not a regex in the definition by *Kleene*, these are simplified LALR parsers (so parsers, not regex-lexers), (2) these are not supported by Java, only by Perl and friends I think (these are even *Turing complete*) (3) these regexes turn out to be verry error prone. – Willem Van Onsem Jan 10 '15 at 10:57
  • 1
    @user1344545: an a sidenote on Avinash Raj's answer: this regex will accept all string with two `{{` before two `}}`, thus `{{{foo}}` (see the error) will be accepted as well. – Willem Van Onsem Jan 10 '15 at 10:58
  • 1
    Thanks a lot for the explanation. I believe you. In my case I expect to have only 2 brackets and it works fine for me. – mcfly soft Jan 10 '15 at 18:42
  • @user1344545: sure I don't say, you **shouldn't** use regexes, but be aware they are not the tool to parse nested structures in general. But in case you want for instance to parse *Java* or *XML*, simply don't do it with regexes... – Willem Van Onsem Jan 10 '15 at 22:24
  • Thanks for explanation. I guess I understood. I still have not tested this in Java, but in regex testers. I will update soon. – mcfly soft Jan 12 '15 at 11:25
0

Don't you need to escape the curly braces? I do in notepad++. Anyway, this should do it

\{\{[^{]+\{\{[^{}]+\}\}[^}]+\}\}
chiliNUT
  • 18,989
  • 14
  • 66
  • 106
  • Thanks . Yes this works for the special case. I enhanced the question, because I probably formulated wrong. And know it works with the solution from Avinash : {{(?:[^{}]|(?R))*}} – mcfly soft Jan 10 '15 at 10:49