1

Let's say I have a string like this:

This is my (2019) awesome string (that I want to modify)

The date in it has to stay, but without parentheses. Meanwhile everything else that is in parentheses has to go. So I would like to achieve this:

This is my 2019 awesome string

I am able to locate the date using this:

\b(201\d{1})\b

And I am also able to locate anything in parentheses using this:

(\(.*\))

But I only want to remove everything if it's not a date in parentheses or else I want to keep the date only removing the parentheses. Is there a way to do this without using if else?

milka1117
  • 521
  • 4
  • 8
  • 17

2 Answers2

2

In Python 3.5+ you may use

s = re.sub(r'\((\d{4})\)|\([^()]*\)', r'\1', s)

If there is a ( + 4 digits + ), only keep the 4 digits, else, remove the match.

See the regex demo.

Details

  • \((\d{4})\) - (, then Capturing group 1 matching four digits and then )
  • | - or
  • \([^()]*\) - a (, then 0+ chars other than ( and ), and then ).

The replacement is just \1 backreference to the value of Group 1.

NOTE: To use this approach in Python versions before 3.5 you will have to use a lambda expression as the replacement argument (due to a bug):

s = re.sub(r'\((\d{4})\)|\([^()]*\)', lambda x: x.group(1) if x.group(1) else '', s)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • @Sweeper Right, for Python 3.5 and later versions, the backreference in the replacement pattern will work. However, with older versions, it will still require a lambda expression with `if else`. – Wiktor Stribiżew May 23 '19 at 09:40
  • @WiktorStribiżew out of curiosity, would it be possible to say, replace everything between parenthesis unless it contains 4 digits? Been trying with `re.sub('\(^(?:\d{4})\)','', s)` but no luck. Does `^` not work with non-capturing groups? – yatu May 23 '19 at 09:43
  • 1
    @yatu That is the first pattern I wrote when I saw this question :): `re.sub(r'\((?!\d{4}\))[^()]*\)','', s)`, see [this regex demo](https://regex101.com/r/LzWbfM/1). – Wiktor Stribiżew May 23 '19 at 09:45
0

Just do it with two nested calls to re.sub:

re.sub(r' ?\(.*\)', '', re.sub(r'\((\d{4})\)', '\\1', my_string))

The inner regex looks for 4-digit numbers in parentheses and removes the parentheses. The outer one removes everything in parentheses that is left (including an optional space in the beginning).