Remove text between () and []

Question

I have a very long string of text with () and [] in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.

The list is similar to this:

x = "This is a sentence. (once a day) [twice a day]"

This list isn't what I'm working with but is very similar and a lot shorter.

Please show what you've tried (by editing your question NOT by adding a comment), and people will point you in the right direction. — mechanical_meat, Jan 30 '13 at 04:50

score 156 · Answer 1 · edited Apr 12 '21 at 17:06

156

You can use re.sub function.

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'

If you want to remove the [] and the () you can use this code:

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence.  '

Important: This code will not work with nested symbols

Explanation

The first regex groups ( or [ into group 1 (by surrounding it with parentheses) and ) or ] into group 2, matching these groups and all characters that come in between them. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with the empty string.

-- modified from comment by Ajay Thomas

edited Apr 12 '21 at 17:06

wjandrea

28,235
9
60
81

answered Jan 30 '13 at 08:10

jvallver

2,230
2
11
20

1

it doesn't work if `x = "ewq[a [(b] ([c))]]"`, it gives `'ewq )]]'` not `'eqw'`... – pradyunsg Jan 30 '13 at 09:30
@paddila I know but Tic does not say anything about nested symbols. – jvallver Jan 30 '13 at 09:34
I commented asking him about it.. he hasn't responded yet – pradyunsg Jan 30 '13 at 09:35
1

can someone explain the regex used here? – markroxor Mar 19 '18 at 10:05
3

@markroxor the first regex groups '(' and ']' into group 1(by surrounding it with parentheses) and ')' and ']' into group 2., matching these groups and all characters that come in between the two groups. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with empty string. Hope it helps – Ajay Thomas Apr 17 '18 at 17:38

pradyunsg · Answer 2 · 2013-01-30T12:04:20.573

Run this script, it works even with nested brackets.
Uses basic logical tests.

def a(test_str):
    ret = ''
    skip1c = 0
    skip2c = 0
    for i in test_str:
        if i == '[':
            skip1c += 1
        elif i == '(':
            skip2c += 1
        elif i == ']' and skip1c > 0:
            skip1c -= 1
        elif i == ')'and skip2c > 0:
            skip2c -= 1
        elif skip1c == 0 and skip2c == 0:
            ret += i
    return ret

x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)

Just incase you don't run it,
Here's the output:

>>> 
ewq This is a sentence.  
'ewq This is a sentence.  '

score 19 · Answer 3 · edited May 23 '17 at 10:31

Here's a solution similar to @pradyunsg's answer (it works with arbitrary nested brackets):

def remove_text_inside_brackets(text, brackets="()[]"):
    count = [0] * (len(brackets) // 2) # count open/close brackets
    saved_chars = []
    for character in text:
        for i, b in enumerate(brackets):
            if character == b: # found bracket
                kind, is_close = divmod(i, 2)
                count[kind] += (-1)**is_close # `+1`: open, `-1`: close
                if count[kind] < 0: # unbalanced bracket
                    count[kind] = 0  # keep it
                else:  # found bracket to remove
                    break
        else: # character is not a [balanced] bracket
            if not any(count): # outside brackets
                saved_chars.append(character)
    return ''.join(saved_chars)

print(repr(remove_text_inside_brackets(
    "This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence.  '

Looks complex at first glance, but is better than mine (and definitely the accepted (my opinion)) — pradyunsg, Mar 17 '13 at 15:54

score 14 · Accepted Answer · edited Feb 08 '21 at 20:53

14

This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.

import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)

or this would find one set of parentheses, simply loop to find more:

start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
  result = mystring[start+1:end]

edited Feb 08 '21 at 20:53

Moot

2,195
2
17
14

answered Jan 30 '13 at 05:14

mbowden

687
6
7

28

I don't know why this answer as marked as correct. The question in asking to *remove* text, not return it. I had the same need (remove text between certain chars) and @jvallver's answer helped me. – Marcelo Assis Sep 30 '15 at 19:08
8

This achieves opposite than OP asked for – simone Dec 07 '17 at 17:16

score 6 · Answer 5 · answered Apr 01 '21 at 08:40

6

You can split, filter, and join the string again. If your brackets are well defined the following code should do.

import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])

answered Apr 01 '21 at 08:40

user3592579

504
6
5

2

Very late, but very better. :-P – Zach Feb 07 '22 at 13:55
1

Just what I needed - short and sweet! – TCSGrad Apr 24 '22 at 19:49
Just know this also doesn't work on something like "A((B))C" – user667804 Feb 24 '23 at 13:40

score 5 · Answer 6 · answered Jul 11 '22 at 09:06

5

You can try this. Can remove the bracket and the content exist inside it.

 import re
    x = "This is a sentence. (once a day) [twice a day]"
    x = re.sub("\(.*?\)|\[.*?\]","",x)
    print(x)

Expected ouput :

This is a sentence.

answered Jul 11 '22 at 09:06

Avinash Raut

1,872
20
26

Dave Trost · Answer 7 · 2022-11-01T23:17:37.317

For anyone who appreciates the simplicity of the accepted answer by jvallver, and is looking for more readability from their code:

>>> import re
>>> x = 'This is a sentence. (once a day) [twice a day]'
>>> opening_braces = '\(\['
>>> closing_braces = '\)\]'
>>> non_greedy_wildcard = '.*?'
>>> re.sub(f'[{opening_braces}]{non_greedy_wildcard}[{closing_braces}]', '', x)
'This is a sentence.  '

Most of the explanation for why this regex works is included in the code. Your future self will thank you for the 3 additional lines.

(Replace the f-string with the equivalent string concatenation for Python2 compatibility)

score 0 · Answer 8 · answered Oct 31 '22 at 06:25

The RegEx \(.*?\)|\[.*?\] removes bracket content by finding pairs, first it remove paranthesis and then square brackets. I also works fine for the nested brackets as it acts in sequence. Ofcourse, it would break in case of bad brackets scenario.


    _brackets = re.compile("\(.*?\)|\[.*?\]")
    _spaces = re.compile("\s+")
    
    _b = _brackets.sub(" ", "microRNAs (miR) play a role in cancer ([1], [2])")
    _s = _spaces.sub(" ", _b.strip())
    print(_s)
    
    # OUTPUT: microRNAs play a role in cancer

Remove text between () and []

8 Answers8

Explanation

Linked

Related