1

I have a python string that I need to remove parentheses. The standard way is to use text = re.sub(r'\([^)]*\)', '', text), so the content within the parentheses will be removed.

However, I just found a string that looks like (Data with in (Boo) And good luck). With the regex I use, it will still have And good luck) part left. I know I can scan through the entire string and try to keep a counter of number of ( and ) and when the numbers are balanced, index the location of ( and ) and remove the content within middle, but is there a better/cleaner way for doing that? It doesn't need to be regex, whatever it will work is great, thanks.

Someone asked for expected result so here's what I am expecting:

Hi this is a test ( a b ( c d) e) sentence

Post replace I want it to be Hi this is a test sentence, instead of Hi this is a test e) sentence

JLTChiu
  • 983
  • 3
  • 12
  • 28
  • 1
    It isn't possible to do it with the re module, but you can do it with the regex module that allows recursion. https://pypi.python.org/pypi/regex – Casimir et Hippolyte Aug 18 '16 at 19:39
  • 1
    In the worst case you can do it with the re module if you build a pattern to match the innermost parenthesis `\([^()]*\)` and if you loop the replacement until there is nothing to replace. But it isn't a very elegant way since you need to parse the string several times. – Casimir et Hippolyte Aug 18 '16 at 19:46
  • Are you open to non-regex solutions? – Dan Aug 18 '16 at 19:49
  • 1
    Can you please share what you expect with the example you gave to make it more clear? – Heval Aug 18 '16 at 19:51
  • I only see one space in the result between "test" and "sentence". If that's the case, are you saying we need to remove a space before "("? Or remove a space after a ")"? – beetea Aug 19 '16 at 06:12
  • Will the input always have matching parentheses? If not, what is the desired behavior in cases with non-matching parentheses? – beetea Aug 19 '16 at 06:22

5 Answers5

5

With the re module (replace the innermost parenthesis until there's no more replacement to do):

import re

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

nb_rep = 1

while (nb_rep):
    (s, nb_rep) = re.subn(r'\([^()]*\)', '', s)

print(s)

With the regex module that allows recursion:

import regex

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

print(regex.sub(r'\([^()]*+(?:(?R)[^()]*)*+\)', '', s))

Where (?R) refers to the whole pattern itself.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
2

First I split the line into tokens that do not contain the parenthesis, for later on joining them into a new line:

line = "(Data with in (Boo) And good luck)"
new_line = "".join(re.split(r'(?:[()])',line))
print ( new_line )
# 'Data with in Boo And good luck'
1

No regex...

>>> a = 'Hi this is a test ( a b ( c d) e) sentence'
>>> o = ['(' == t or t == ')' for t in a]
>>> o
[False, False, False, False, False, False, False, False, False, False,
 False, False, False, False, False, False, False, False, True, False, False, 
 False, False, False, True, False, False, False, False, True, False, False,
 True, False, False, False, False, False, False, False, False, False]
>>> start,end=0,0
>>> for n,i in enumerate(o):
...  if i and not start:
...   start = n
...  if i and start:
...   end = n
...
>>>
>>> start
18
>>> end
32
>>> a1 = ' '.join(''.join(i for n,i in enumerate(a) if (n<start or n>end)).split())
>>> a1
'Hi this is a test sentence'
>>>
yourstruly
  • 972
  • 1
  • 9
  • 17
1

Assuming (1) there are always matching parentheses and (2) we only remove the parentheses and everything in between them (ie. surrounding spaces around the parentheses are untouched), the following should work.

It's basically a state machine that maintains the current depth of nested parentheses. We keep the character if it's (1) not a parenthesis and (2) the current depth is 0.

No regexes. No recursion. A single pass through the input string without any intermediate lists.

tests = [
    "Hi this is a test ( a b ( c d) e) sentence",
    "(Data with in (Boo) And good luck)",
]

delta = {
    '(': 1,
    ')': -1,
}

def remove_paren_groups(input):
    depth = 0

    for c in input:
        d = delta.get(c, 0)
        depth += d
        if d != 0 or depth > 0:
            continue
        yield c

for input in tests:
    print ' IN: %s' % repr(input)
    print 'OUT: %s' % repr(''.join(remove_paren_groups(input)))

Output:

 IN: 'Hi this is a test ( a b ( c d) e) sentence'
OUT: 'Hi this is a test  sentence'
 IN: '(Data with in (Boo) And good luck)'
OUT: ''
beetea
  • 308
  • 1
  • 8
0

Referenced from here

import re
item = "example (.com) w3resource github (.com) stackoverflow (.com)"

### Add lines in case there are non-ascii problem:
# -*- coding: utf-8 -*-
item = item .decode('ascii', errors = 'ignore').encode()

print re.sub(r" ?\([^)]+\)", "", item)
Mark K
  • 8,767
  • 14
  • 58
  • 118