3

I've some string like :

(((a * b) + c) * d)

and want to capture parenthesized groups with java regex. I thought this simple regex

Pattern p = Pattern.compile("\\((.*)\\)",Pattern.DOTALL);

would do the work but it does not.

Whats wrong with that?

archangle
  • 217
  • 4
  • 12

3 Answers3

3

The language you're trying to define with your regular expression unfortunately smells non-regular, i.e. regular expressions are not suitable for this type of expressions. (To be precise, "well balanced parenthesis" is not something you can define with regular expressions.)

If you however simply want to find the substring a * b in your example, the following expressions should do:

Pattern p = Pattern.compile("\\(([^()]*)\\)");
Matcher m = p.matcher("(((a * b) * ) + c) * d)");
if (m.find())
    System.out.println(m.group(1));   // prints "a * b"
aioobe
  • 413,195
  • 112
  • 811
  • 826
  • +1 - It is definitely non-regular. Any grammar for expressions with balanced parentheses is inherently recursive, but strict REs restrict you to alternation and repetition – Stephen C Dec 03 '10 at 14:20
  • I'm not 100% sure that the OP is trying to figure out if the expression has well balanced parenthesis though. – aioobe Dec 03 '10 at 14:22
  • Thank you aioobe. I clearly missed the "regular" thing. its, of course, a non-regular expression. I actually wanted to capture all the groups. That means ive expected to have ((a * b) + c) * d (a * b) + c a * b – archangle Dec 03 '10 at 14:36
  • 1
    Ok. Well, I can only say, you're not alone. Many people have run into this before you :-) – aioobe Dec 03 '10 at 14:40
  • @aioobe: You are wrong. It is trivial to write [a pattern for balanced parens](http://stackoverflow.com/questions/4031112/regular-expression-matching/4034386#4034386) in any modern language. Unfortunately for the OP, with this as with so many other regex-related things, Java’s ludicrous antemillennial blinders render it wholly unsuitable for even such simple tasks as these. – tchrist Dec 03 '10 at 15:11
  • Ah, yes... it's even easier using aiooberegexps though, thanks to good old `\parbal` ;) it's a shame it's not a universal standard though... – aioobe Dec 03 '10 at 19:05
1

Regexes aren't good at picking up balanced pairs like parentheses. You'd do much better to parse the string without a regex.

Skilldrick
  • 69,215
  • 34
  • 177
  • 229
  • On the contrary, they are [really quite good at it](http://stackoverflow.com/questions/4031112/regular-expression-matching/4034386#4034386). – tchrist Dec 03 '10 at 15:14
  • @tchrist, you keep sweeping an important fact under the carpet: The "tricks" you use rely on an extension of regular expressions that is not a universal standard by far. I sincerely recommend you to stick to the perl and php tags for these kind of comments. – aioobe Dec 03 '10 at 21:34
  • @aioobe: I sincerely recommend you stop pretending Java somehow has standard regexes by denigrating everybody else’s as mere "tricks". Having named buffers is hardly a trick. Supporting even **one single Unicode property from this millennium** is hardly a trick, including Unicode scripts and non-general categories. Supporting logical codepoints instead of UTF-16 is not a trick. Supporting grapheme clusters is no trick. Letting `"élève"` match `\b\w+\b` **ANYWHERE** is no trick. Not letting `"\t\n "` improperly match `^\s*\S+$` is no trick. What **is** a trick is dealing with Java’s brokenness! – tchrist Dec 03 '10 at 22:20
1

I believe it is virtually impossible to deal with nested structures using a RegEx. Much better to iterate through each character and keep track of how many open brackets you have.

Also, if you're aiming to evaluate a mathematical expression in infix notation you would probably have more success using the shunting algorithm.

El Ronnoco
  • 11,753
  • 5
  • 38
  • 65
  • I was taught that the word “virtually” is always used as a dissembling euphemism — weasel-words, if you would — for “not”. And so it is here, since it actually means [not impossible](http://stackoverflow.com/questions/4031112/regular-expression-matching/4034386#4034386]) after all. – tchrist Dec 03 '10 at 15:13