0

I am trying to extract something with the regex:

Pattern logEntry = Pattern.compile("digraph Checker \\{(.*)\\}");

for the block of text:

{ /*uninterested in this*/ 
"
digraph Checker 
{ 
/*bunch of stuff*/
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
/*bunch of stuff*/
} //first most curly brace ends, would want the regex to filter out till here, incl. the braces
"
}

and expect the output to be:

digraph Checker 
{ 
/*bunch of stuff*/
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
/*bunch of stuff*/
}

but can't seem to get rid of the last

"
}

Is there a way that I could extract this?

newbie
  • 41
  • 1
  • 5
  • 1
    In truth, Java regex engine doesn't do balanced text. So there is nothing you can do to solve this with regex. You could if it were dot-net or PCRE or Perl engines. –  Jul 15 '15 at 22:41

2 Answers2

2

You can use this regex:

Pattern logEntry = Pattern.compile("digraph Checker\\s+{((?:[^{]*{[^}]*})*[^}]*)}");

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    This doesn't work. Its not balanced text, Java doesn't do that. –  Jul 15 '15 at 22:39
  • I will let OP decide that after looking at demo. in any case it is not a generic regex for matching balanced brackets just specific to OP's problem. – anubhava Jul 16 '15 at 06:02
  • It wouldn't be so bad if it weren't so grotesquely wrong. It matches `{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{}}` and `{{}}}}}}}}}}}}}}}}}}}}}}}}{{{{{}}` –  Jul 16 '15 at 16:49
1

@anubhava showed you a clever (but complicated) regex specifically adapted to your example. But as said by @sln, regexes are not well suited for balanced elements. That's the reason why specific libraries were developed to process XML (which make extensive use of balanced elements) such as JSoup.

So even if it is not the expected answer, the rule here is do not even try to use java regexes to parse balanced elements : you could find ways that (seem to) work in some cases but will break in another slightly different one.

The best you should to here is to build a dedicated parser. Or use one of the parser builders listed in Yacc equivalent for Java. According to that page, ANTLR should be the most popular Java tool for lexing/parsing. But if you are used to Lex/Yacc, you have also a look to JFlex and BYACC/J that do like that kind of parsing ...

Community
  • 1
  • 1
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252