4

Possible Duplicate:
Can regular expressions be used to match nested patterns?

I have a string like this:

$string = "Hustlin' ((Remix) Album Version (Explicit))";

and I want to basically remove everything in parentheses. In the case above with nested parentheses, I want to just remove it at the top level.

So I expect the result to just be "Hustlin' ".

I tried the following:

preg_replace("/\(.*?\)/", '', $string);|

which returns the odd result of: "Hustlin' Album Version )".

Can someone explain what happened here?

Community
  • 1
  • 1
Andy Hin
  • 30,345
  • 42
  • 99
  • 142
  • 1
    **Do not close this as a dup, especially of the one in the close votes: *THEY ARE ALL WRONG*** because these are PHP patterns, and hence Perl patterns, and thus it is perfectly possible to remove nested parens this way, since like many modern pattern-matching engines, Perl supports recursive patterns. – tchrist Oct 07 '12 at 00:25

3 Answers3

10

Your pattern \(.*?\) matches a ( and will then find the first ) (and everything in between): how would the pattern "understand" to match balanced parenthesis?

However, you could make use of PHP's recursive pattern:

$string = "Hustlin' ((Remix) Album Version (Explicit)) with (a(bbb(ccc)b)a) speed!";
echo preg_replace("/\(([^()]|(?R))*\)/", "", $string) . "\n";

would print:

Hustlin'  with  speed!

A short break down of the pattern:

\(         # match a '('
(          # start match group 1
  [^()]    #   any char except '(' and ')'
  |        #   OR
  (?R)     #   match the entire pattern recursively
)*         # end match group 1 and repeat it zero or more times
\)         # match a ')'
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Can you please explain why `(?R)` part has to be exactly where it is? – Ωmega Oct 06 '12 at 19:41
  • @Ωmega, inside the `( ... )*` loop, you either match no parenthesis (the `[^()]` part), or you match a set of balanced parenthesis (which is the entire pattern itself again, denoted by `(?R)`). – Bart Kiers Oct 06 '12 at 19:49
  • 1
    Why not `\(([^()]*|(?R))*\)`..? – Ωmega Oct 06 '12 at 19:54
  • @Ωmega, sure, that's also good. As is this: `/\(([^()]*+|(?R))*\)/` or this: `/\(([^()]++|(?R))*+\)/`. – Bart Kiers Oct 06 '12 at 19:56
  • @Ωmega, probably the last one: `/\(([^()]++|(?R))*+\)/`, but feel free to test them all to see for yourself! I often post the less obscure one to make my life easier explaining the pattern (which I think is the one in my answer). Besides, if one is really worried about performance, and there are massive amounts of text to be processed, regex might not be the goto-tool. – Bart Kiers Oct 06 '12 at 20:04
  • I was just curious about possessive quantifier, as I don't use them much, but I want to learn. Would `/\(([^()]++|(?R))*+\)/` remove from string `ABC (DEF () GHI) JKL` also `()` part..? – Ωmega Oct 06 '12 at 20:08
  • @Ωmega, yes it would also remove *empty* parenthesis because of the `*` at the end: `...(?R))*+\)/`. Test it here: http://ideone.com/YGaYJ – Bart Kiers Oct 06 '12 at 20:20
  • Further reading about recursive regular expressions: http://blog.angeloff.name/post/2012/08/05/php-recursive-patterns/ – We Are All Monica Mar 10 '16 at 11:22
  • Example of the regex engine stepping through a recursive pattern: http://stackoverflow.com/a/8442349/106302 – We Are All Monica Mar 10 '16 at 11:31
0

Make a simple loop with regex replacement inside.

Replace only first occurrence of

/\([^()]*\)/

with empty string and repeat that until no match found.

hakre
  • 193,403
  • 52
  • 435
  • 836
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • 2
    @NullUserException, hammering on the fact that the solution is wrong is not really constructive: how about explaining *what* is wrong? – Bart Kiers Oct 06 '12 at 19:23
  • @NullUserException, I knew that, but your latest reply is at least an explanation which is better than simply posting that "it's wrong". Don't you agree? I guess the summer of love is gone... :) – Bart Kiers Oct 06 '12 at 19:28
  • after the edit i suggest you add some little php code as well, because that loop to do is not that straight forward if you do not know the preg functions well. – hakre Oct 06 '12 at 19:29
  • @BartKiers I was going to post an explanation, but got distracted by kitten videos. I removed the downvote after the OP fixed his answer BTW. – NullUserException Oct 06 '12 at 19:30
  • @hakre - Thanks for correction of **"until match" → "until no match"** typo :) Bart Kiers came with recursive solution within regex, which I believe is much faster solution than loop out of regex, so code of my idea is not needed... – Ωmega Oct 06 '12 at 19:45
0

To answer your question: Your regex statement using preg_replace matches and removes the first occurence of a paranthesis sequence and everything inside of it - no matter if another opening bracket occurs.

matthias
  • 2,255
  • 1
  • 23
  • 28