Remove text between () and [] based on condition in Python?

Question

I'm trying to remove the characters between the parentheses and brackets based on the length of characters inside the parentheses and brackets.

Using this:

def remove_text_inside_brackets(text, brackets="()[]"):
    count = [0] * (len(brackets) // 2) # count open/close brackets
    saved_chars = []
    for character in text:
        for i, b in enumerate(brackets):
            if character == b: # found bracket
                kind, is_close = divmod(i, 2)
                count[kind] += (-1)**is_close # `+1`: open, `-1`: close
                if count[kind] < 0: # unbalanced bracket
                    count[kind] = 0  # keep it
                else:  # found bracket to remove
                    break
        else: # character is not a [balanced] bracket
            if not any(count): # outside brackets
                saved_chars.append(character)
    return ''.join(saved_chars)

I'm able to remove the characters between the parentheses and brackets, but I cannot figure out how to remove the characters based on the length of characters inside.

I wanted to remove characters between the parentheses and brackets if the length <=4 with parentheses and brackets if they are >4 remove only parentheses and brackets. Sample Text:

text = "This is a sentence. (RMVE) (Once a day) [twice a day] [RMV]"

Output:

print(remove_text_inside_brackets(text))

This is a sentence.

Desired Output:

This is a sentence. Once a day twice a day

What would be the output of `"(ab[XX]c)"`? If you remove `[XX]` you'd have `(abc)` which violates the rule "I wanted to remove characters between the parentheses and brackets if the length <=4 with parentheses" and leaving it would also violate the same rule. — Ch3steR, Dec 20 '21 at 06:32
@Ch3steR, You are right, it violates. But in that scenario, I want to remove `(ab[XX]c)`. But so far I don't have such cases in my text. — Ailurophile, Dec 20 '21 at 06:52

score 3 · Accepted Answer · answered Dec 20 '21 at 06:39

3

You can use a simple regex with re.sub and a function as replacement to check the length of the match:

import re
out = re.sub('\(.*?\)|\[.*?\]',
             lambda m: '' if len(m.group())<=(4+2) else m.group()[1:-1],
             text)

Output:

'This is a sentence.  Once a day twice a day '

This give you the logic for more complex checks, in which case you might want to define a named function rather than a lambda

answered Dec 20 '21 at 06:39

mozway

194,879
13
39
75

Don't forget to remove the double space: `out = re.sub(r"(\s{2})", ' ', out)` – Wesley Cheek Dec 20 '21 at 07:02
thank you for your answer, i didn't knew before that we can write functions to match groups in replace text, its really helpful to learn. – Santhosh Reddy Dec 20 '21 at 07:10
@mozway, Why is it `<=(4+2)`??? – Ailurophile Dec 20 '21 at 15:56
1

4 is the length you want and you need to add 2 for the opening and closing bracket/parenthesis – mozway Dec 20 '21 at 18:58

score 2 · Answer 2 · answered Dec 20 '21 at 06:57

How about splitting on [ and look for ] and measure length (since each split with ] will be necessarily longer than normal split, 4 becomes 5):

def remove_text_inside_brackets(string):
    my_str = string.replace('(','[').replace(')',']')
    out = []
    for s in my_str.split('['):
        if ']' in s and len(s) > 5:
            s1 = s.rstrip().rstrip(']') + ' '
        elif ']' in s and len(s) <= 5:
            s1 = ['']
        else:
            s1 = s
        out.extend(s1)
    return ''.join(out).strip()

remove_text_inside_brackets(text)

Output:

'This is a sentence. RMVE Once a day twice a day'

score 1 · Answer 3 · answered Dec 20 '21 at 06:31

Someone will hopefully improve on this, but as an alternative, this nested regular expression can work:

re.sub(r'\[([^)]{5,})\]', '\g<1>', 
       re.sub(r'\(([^)]{5,})\)', '\g<1>', 
              re.sub(r'\[[^\]]{,4}\]', '', 
                     re.sub(r'\([^)]{,4}\)', '', text))))

Note that extra spaces, after the period and at the end of the line.

The output of this is slightly different than your given expected output:

'This is a sentence.  Once a day twice a day '

It completely removes text and its surrounding brackets when the length is 4 or shorter, while it replaces the match with just the inner text where the length if 5 or longer.

Note that nested brackets, e.g., ((some text) more text) or [(four)] may fail.

score 1 · Answer 4 · answered Dec 20 '21 at 06:33

I would just use string.find, rather than go character by character. Too much state to track. Note that this will explode if there is an unmatched open paren or open bracket. That's not hard to catch.

text = "This is a sentence. (RMVE) (Once a day) [twice a day] [RMV]"

def remove_text_inside_brackets(text):
    i = 0
    while i >= 0:
        # Try for parens.
        i = text.find('(')
        j = text.find(')')
        if i < 0:
            # No parens, try for brackets.
            i = text.find('[')
            j = text.find(']')
        if i >= 0:
            if j-i > 5:
                text = text[:i] + text[i+1:j] + text[j+1:]
            else:
                text = text[:i] + text[j+1:]
    return text

print(remove_text_inside_brackets(text))

score 1 · Answer 5 · edited Dec 20 '21 at 07:09

1

We can take help from regular expressions to solve this

import re
text = "This is a sentence. (RMVE) (Once a day) [twice a day] [RMV]"
text = re.sub('(\(|\[)[a-zA-Z]{1,4}(\)|\])', '', text)
print(re.sub('\[|\]|\(|\)', '', text))

output: "This is a sentence.  Once a day twice a day"

here in the regular expression i tried to match the pattern for 1 to 4 length of letter inside braces, along with braces, you can also match numbers and other special characters too.

edited Dec 20 '21 at 07:09

Dharman

30,962
25
85
135

answered Dec 20 '21 at 07:03

Santhosh Reddy

123
1
6

This would match a parenthesis with a square bracket, e.g. "(text]" would be removed. Perhaps that is intended, but it's not clear from the question. – 9769953 Dec 20 '21 at 08:27
Also, it only matches 52 letters. A string like "[be4]" will not be removed. – 9769953 Dec 20 '21 at 08:28

Remove text between () and [] based on condition in Python?

5 Answers5