1

I have an input string which contains parenthesis inside and outside double quotes.These parentheses can be nested. I want to strip off strings with parentheses present only outside of double quotes.

I tried this regex r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)' This fetches everything that is enclosed within round brackets no matter inside or outside double quotes.

    import re
    input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))'''
    result = re.sub(r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)','', input_string)
    print result

The actual output I am getting is:

'"Hello World "  anything outside round brackets should remain as is'

I expect the output to be:

'"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is'
Shweta
  • 11
  • 2
  • Regular expressions can't generally handle nested parentheses, since matching them up is beyond what can be described in a regular language. Now, Python's `re` module (like most major regex libraries) lets you go a bit beyond regular languages, but maybe not far enough for what you need. – Blckknght Jul 17 '19 at 05:41
  • Use [this](https://rextester.com/BSF44753) with Python PyPi regex module, not re. – Wiktor Stribiżew Jul 17 '19 at 06:01
  • Use a parser, not regex. –  Jul 17 '19 at 08:42

2 Answers2

1

If your parentheses are balanced (with help of this answer):

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this (String this)'''

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), input_string)

print(s)

Prints:

"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is Also remain this 

EDIT Running some test-cases:

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''

test_cases = ['Normal string (strip this)',
'"Normal string (dont strip this)"',
'"Normal string (dont strip this)" but (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

def my_strip(s):
    return re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)

for test in test_cases:
    print(test)
    print(my_strip(test))
    print()

Prints:

Normal string (strip this)
Normal string 

"Normal string (dont strip this)"
"Normal string (dont strip this)"

"Normal string (dont strip this)" but (strip this)
"Normal string (dont strip this)" but 

"Normal string (dont strip this)" but (strip this) and (strip this)
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but  and  but "dont strip (this)"

"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") 
"Normal string (dont strip this)" but ( but "remain this (xxx)") 

EDIT: To remove all (), even with quoted strings inside them:

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''

test_cases = ['"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

def my_strip(s):
    s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)
    return re.sub(r'".*?"|(\(.*\))', lambda g: '' if g.group(1) else g.group(), s)

for test in test_cases:
    print(test)
    print(my_strip(test))
    print()

Prints:

"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but  and  but "dont strip (this)"

"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") 
"Normal string (dont strip this)" but  
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Thank you Andrej for your answer and also for testing it with different scenarios. I had a doubt on last scenario: "Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") . Here, I want the output to be: '"Normal string (dont strip this)" but ' . Could you please help me with this modification? – Shweta Jul 17 '19 at 08:20
  • I tested the modified code. For input:'Normal_string1 (bracket1"(bracket2)") normal_string2(brakcet3) "normal_string4(bracket4)" ' , the output I am getting is:'Normal_string1 " '. But expected output is:'Normal_string1 normal_string2 "normal_string4(bracket4)" ' . Can you please check this? – Shweta Jul 17 '19 at 10:55
  • @Shweta Python's `re` module isn't strong enough for your use case. There will be always corner-cases like these. I recommend using other method than `re` – Andrej Kesely Jul 17 '19 at 10:58
0

Using regex instead of re, you could go with

"[^"]+"(*SKIP)(*FAIL) # ignore anything between double quotes
|                     # or
\(
    (?:[^()]*|(?R))+  # match nested parentheses
\)

See a demo on regex101.com.


In Python this could be
import regex as re

data = """"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))"""

rx = re.compile(r'''
    "[^"]+"(*SKIP)(*FAIL)
    |
    \(
        (?:[^()]*|(?R))+
    \)''', re.VERBOSE)

data = rx.sub("", data)
print(data)

Yielding

"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is
Jan
  • 42,290
  • 8
  • 54
  • 79
  • Thank you Jan. This works perfectly with regex module. But I want pattern that works on re module as I do not want to install regex module. It'll be helpful if you can give me a pattern that works with re. – Shweta Jul 17 '19 at 08:26