2

I have a little knowledge about RegEx, but at the moment, it is far above of my abilities.

I'm needing help to find the text before the last open-parenthesis that doesn't have a matching close-parenthesis.

(It is for CallTip of a open source software in development.)

Below some examples:

--------------------------
Text               I need
--------------------------
aaa(                  aaa
aaa(x)                ''
aaa(bbb(              bbb
aaa(y=bbb(            bbb
aaa(y=bbb()           aaa
aaa(y <- bbb()        aaa
aaa(bbb(x)            aaa
aaa(bbb(ccc(          ccc
aaa(bbb(x), ccc(      ccc
aaa(bbb(x), ccc()     aaa
aaa(bbb(x), ccc())    ''
--------------------------

Is it possible to write a RegEx (PCRE) for these situations?

The best I got was \([^\(]+$ but, it is not good and it is the opposite of what I need.

Anyone can help please?

jcfaria
  • 312
  • 3
  • 14
  • Regexes can't handle parenthesis matching. IF you can guarantee a max depth, you might be able to do it. – Kevin Jun 06 '13 at 02:38
  • To clarify: it is the text before the last open-parenthesis that doesn't have a matching close-parenthesis, right? Do only `()` parentheses matter? – Floris Jun 06 '13 at 02:39
  • Welcome to StackOverflow. As it's written now, a perfectly suitable answer to your question is "Yes, it is possible" or "No, it is not possible" with no further information, which would not be helpful to you at all. Please [edit] your question to rephrase it to be more specific; if you don't, it might end up being closed. – Ken White Jun 06 '13 at 02:39
  • @Kevin PCRE in particular *can* handle arbitrary recursive nesting. – Martin Ender Jun 12 '13 at 21:08

4 Answers4

3

enter image description here

Take a look at this JavaScript function

var recreg = function(x) {
var r = /[a-zA-Z]+\([^()]*\)/;
while(x.match(r)) x = x.replace(r,'');
return x
}

After applying this you are left with all unmatched parts which don't have closing paranthesis and we just need the last alphabetic word.

var lastpart = function(y) { return y.match(/([a-zA-Z]+)\([^(]*$/); }}

The idea is to use it like

 lastpart(recreg('aaa(y <- bbb()'))

Then check if the result is null or else take the matching group which will be result[1]. Most of the regex engines don't support ?R flag which is needed for recursive regex matching.

Note that this is a sample JavaScript representation which simulated recursive regex. Read http://www.catonmat.net/blog/recursive-regular-expressions/

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
FUD
  • 5,114
  • 7
  • 39
  • 61
1

This works correctly on all your sample strings:

\w+(?=\((?:[^()]*\([^()]*\))*[^()]*$)

The most interesting part is this:

(?:[^()]*\([^()]*\))*

It matches zero or more balanced pairs of parentheses along with the non-paren characters before and between them (like the y=bbb() and bbb(x), ccc() in your sample strings). When that part is done, the final [^()]*$ ensures that there are no more parens before the end of the string.

Be aware, though, that this regex is based on the assumption that there will never be more than one level of nesting. In other words, it assumes these are valid:

aaa()
aaa(bbb())
aaa(bbb(), ccc())

...but this isn't:

aaa(bbb(ccc()))

The string ccc(bbb(aaa( in your samples seems to imply that multi-level nesting is indeed permitted. If that's the case, you won't be able to solve your problem with regex alone. (Sure, some regex flavors support recursive patterns, but the syntax is hideous even by regex standards. I guarantee you won't be able to read your own regex a week after you write it.)

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

A partial solution - this is assuming that your regex is called from within a programming language that can loop.

1) prune the input: find matching parentheses, and remove them with everything in between. Keep going until there is no match. The regex would look for ([^()]) - open parenthesis, not a parenthesis, close parenthesis. It has to be part of a "find and replace with nothing" loop. This trims "from the inside out".

2) after the pruning you have either no parentheses left, or only leading/trailing ones. Now you have to find a word just before an open parenthesis. This requires a regex like \w(. But that won't work if there are multiple unclosed parentheses. Taking the last one could be done with a greedy match (with grouping around the last \w): ^.*\w( "as many characters as you can up to a word before a parenthesis" - this will find the last one.

I am saying "approximate" solution because, depending on the environment you are using, how you say "this matching group" and whether you need to put a backslash before the () varies. I left that detail out as its hard to check on my iPhone.

I hope this inspires you or others to come up with a complete solution.

Floris
  • 45,857
  • 6
  • 70
  • 122
0

Not sure which regex langage/platform you're using for this and don't know if subpatterns are allowed in your platform or not. However following 2 step PHP code will work for all the cases you listed above:

$str = 'aaa(bbb(x), ccc()'; // your original string

// find and replace all balanced square brackets with blank
$repl = preg_replace('/ ( \( (?: [^()]* | (?1) )* \) ) /x', '', $str);

$matched = '';
// find word just before opening square bracket in replaced string
if (preg_match('/\w+(?=[^\w(]*\([^(]*$)/', $repl, $arr))
   $matched = $arr[0];
echo "*** Matched: [$matched]\n";

Live Demo: http://ideone.com/evXQYt

anubhava
  • 761,203
  • 64
  • 569
  • 643