9

When I have a string like this:

s1 = 'stuff(remove_me)'

I can easily remove the parentheses and the text within using

# returns 'stuff'
res1 = re.sub(r'\([^)]*\)', '', s1)

as explained here.

But I sometimes encounter nested expressions like this:

s2 = 'stuff(remove(me))'

When I run the command from above, I end up with

'stuff)'

I also tried:

re.sub('\(.*?\)', '', s2)

which gives me the same output.

How can I remove everything within the outer parentheses - including the parentheses themselves - so that I also end up with 'stuff' (which should work for arbitrarily complex expressions)?

Community
  • 1
  • 1
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • Check [*Remove text between () and \[\] in python*](http://stackoverflow.com/a/14598135/3832970). – Wiktor Stribiżew May 30 '16 at 14:44
  • @WiktorStribiżew: Thanks! But that is about expressions which are not nested. And I am pretty sure that there exists something which does not require a lot of if-else clauses and a for-loop. – Cleb May 30 '16 at 14:48
  • 1
    This [answer](http://stackoverflow.com/a/12280660/3832970) contains the regex you need but you need a PyPi regex module. – Wiktor Stribiżew May 30 '16 at 14:54

6 Answers6

19

NOTE: \(.*\) matches the first ( from the left, then matches any 0+ characters (other than a newline if a DOTALL modifier is not enabled) up to the last ), and does not account for properly nested parentheses.

To remove nested parentheses correctly with a regular expression in Python, you may use a simple \([^()]*\) (matching a (, then 0+ chars other than ( and ) and then a )) in a while block using re.subn:

def remove_text_between_parens(text):
    n = 1  # run at least once
    while n:
        text, n = re.subn(r'\([^()]*\)', '', text)  # remove non-nested/flat balanced parts
    return text

Bascially: remove the (...) with no ( and ) inside until no match is found. Usage:

print(remove_text_between_parens('stuff (inside (nested) brackets) (and (some(are)) here) here'))
# => stuff   here

A non-regex way is also possible:

def removeNestedParentheses(s):
    ret = ''
    skip = 0
    for i in s:
        if i == '(':
            skip += 1
        elif i == ')'and skip > 0:
            skip -= 1
        elif skip == 0:
            ret += i
    return ret

x = removeNestedParentheses('stuff (inside (nested) brackets) (and (some(are)) here) here')
print(x)              
# => 'stuff   here'

See another Python demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Just in case one needs to use the `re` approach to remove nested square brackets, use `r'\[[^][]*]'` pattern. For curly braces, use `r'{[^{}]*}'` – Wiktor Stribiżew Jan 01 '21 at 12:59
6

As mentioned before, you'd need a recursive regex for matching arbitrary levels of nesting but if you know there can only be a maximum of one level of nesting have a try with this pattern:

\((?:[^)(]|\([^)(]*\))*\)
  • [^)(] matches a character, that is not a parenthesis (negated class).
  • |\([^)(]*\) or it matches another ( ) pair with any amount of non )( inside.
  • (?:...)* all this any amount of times inside ( )

Here is a demo at regex101

Before the alternation used [^)(] without + quantifier to fail faster if unbalanced.
You need to add more levels of nesting that might occure. Eg for max 2 levels:

\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)

Another demo at regex101

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 1
    Very nice, thanks for the detailed explanation (upvoted)! – Cleb May 30 '16 at 19:05
  • 1
    I just came across a similar situation and searched a lot for this solution. Thanks for sharing this idea with good explanation. – jazzurro Feb 15 '19 at 11:42
1

re matches are eager so they try to match as much text as possible, for the simple test case you mention just let the regex run:

>>> re.sub(r'\(.*\)', '', 'stuff(remove(me))')
'stuff'
alexamici
  • 754
  • 5
  • 10
  • 5
    @Cleb be warned that this doesn't check if the braces are matched. E.g. in `foo(bar)baz(spam)e)ggs`, it'll leave only `fooggs`. – ivan_pozdeev May 30 '16 at 14:52
  • @ivan_pozdeev: Thanks for the warning, good to know! In my examples they should be matched but I'll add a check anyway. – Cleb May 30 '16 at 15:10
1

If you are sure that the parentheses are initially balanced, just use the greedy version:

re.sub(r'\(.*\)', '', s2)
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
1

https://regex101.com/r/kQ2jS3/1

'(\(.*\))'

This captures the furthest parentheses, and everything in between the parentheses.

Your old regex captures the first parentheses, and everything between to the next parentheses.

Bryce Drew
  • 5,777
  • 1
  • 15
  • 27
0

I have found a solution here:

http://rachbelaid.com/recursive-regular-experession/

which says:

>>> import regex
>>> regex.search(r"^(\((?1)*\))(?1)*$", "()()") is not None
True
>>> regex.search(r"^(\((?1)*\))(?1)*$", "(((()))())") is not None
True
>>> regex.search(r"^(\((?1)*\))(?1)*$", "()(") is not None
False
>>> regex.search(r"^(\((?1)*\))(?1)*$", "(((())())") is not None
False
ozdemir
  • 379
  • 5
  • 8