3

I would like to know how to strip the last occurrence of () and its contents given a string.

The below code strips all the () in a string.

bracketedString     = '*AWL* (GREATER) MINDS LIMITED (CLOSED)'
nonBracketedString  = re.sub("\s\(.*?\)", '', bracketedString)
print(nonBracketedString1)

I would like the following output.

*AWL* (GREATER) MINDS LIMITED
mrzasa
  • 22,895
  • 11
  • 56
  • 94
VKB
  • 65
  • 1
  • 7
  • Why would you want to use regular expressions for that? Just locate the last curly and, if found, remove it. – Ulrich Eckhardt Mar 08 '18 at 22:32
  • 1
    Those are not curlies. – Paul Panzer Mar 08 '18 at 22:33
  • It's worth noting that you've just taken the most famous example always given of "things a regular language cannot parse", and asked how to parse them with a regular expression. Of course `re` can do a lot more than an actually regular language, and there may be something about your data that makes this doable even with strict regular expressions (e.g., your parens can't be nested, or can only be nested to depth 3, or whatever), but still, that's a good sign that you may have reached for the wrong tool here. – abarnert Mar 08 '18 at 22:40

2 Answers2

7

You may remove a (...) substring with a leading whitespace at the end of the string only:

\s*\([^()]*\)$

See the regex demo.

Details

  • \s* - 0+ whitespace chars
  • \( - a (
  • [^()]* - 0+ chars other than ( and )
  • \) - a )
  • $ - end of string.

See the Python demo:

import re
bracketedString     = '*AWL* (GREATER) MINDS LIMITED (CLOSED)'
nonBracketedString  = re.sub(r"\s*\([^()]*\)$", '', bracketedString)
print(nonBracketedString) # => *AWL* (GREATER) MINDS LIMITED

With PyPi regex module you may also remove nested parentheses at the end of the string:

import regex
s = "*AWL* (GREATER) MINDS LIMITED (CLOSED(Jan))" # => *AWL* (GREATER) MINDS LIMITED
res = regex.sub(r'\s*(\((?>[^()]+|(?1))*\))$', '', s)
print(res)

See the Python demo.

Details

  • \s* - 0+ whitespaces
  • (\((?>[^()]+|(?1))*\)) - Group 1:
    • \( - a (
    • (?>[^()]+|(?1))* - zero or more repetitions of 1+ chars other than ( and ) or the whole Group 1 pattern
    • \) - a )
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This works fine but I would like to remove all contents that is present inside the last braces. For example, If I give '*AWL* (GREATER) MINDS LIMITED (CLOSED(Jan))', then it does not work... – VKB Mar 22 '18 at 12:35
  • @VKB See an updated answer with a nested parentheses removal at the end of string example. – Wiktor Stribiżew Mar 22 '18 at 12:59
1

In case you want to replace last occurrence of brackets even if they are not at the end of the string:

*AWL* (GREATER) MINDS LIMITED (CLOSED) END

you can use tempered greedy token:

>>> re.sub(r"\([^)]*\)(((?!\().)*)$", r'\1', '*AWL* (GREATER) MINDS LIMITED (CLOSED) END')                        
# => '*AWL* (GREATER) MINDS LIMITED  END'  

Demo

Explanation:

  • \([^)]*\) matches string in brackets
  • (((?!\().)*)$ assures that there are no other opening bracket until the end of the string

    • (?!\() is negative lookeahead checking that there is no ( following
    • . matches next char (that cannot be ( because of the negative lookahead)
    • (((?!\().)*)$ the whole sequence is repeated until the end of the string $ and kept in a capturing group
  • we replace the match with the first capturing group (\1) that keeps the match after the brackets
mrzasa
  • 22,895
  • 11
  • 56
  • 94