6

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"

I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E

i tried doing this:

p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)

It gives output: AX&EUr)

Is there any way to correct this, rather than iterating each element in the string?

Adam Lear
  • 38,111
  • 12
  • 81
  • 101
Jay
  • 1,392
  • 7
  • 17
  • 44
  • why did you start another similar one? http://stackoverflow.com/questions/5846576/python-string-manipulation/5846590#5846590 – ghostdog74 May 01 '11 at 08:09
  • @ghostdog74 Prob. because the OP posted a non-nesting example there, and only realized through the answers that he needs to cover nesting as well. – ThomasH May 01 '11 at 09:27
  • yea. sorry about that :). I tried editing the previous post, seeing as i got no replies, i thought i'd make a new post. – Jay May 01 '11 at 12:06

8 Answers8

6

Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:

p = re.compile("\([^()]*\)")
count = 1
while count:
    s, count = p.subn("", s)

Working example: http://ideone.com/WicDK

John Machin
  • 81,303
  • 11
  • 141
  • 189
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • 3
    You don't need to do a search, use re.subn() which returns both the new string and a count of substitutions made ... repeat until count is zero. – John Machin May 01 '11 at 09:21
  • @John - Thanks! I was able to write something like http://ideone.com/0zEAO - `count = 1; while count != 0: s, count = p.subn("", s)` - does that seem simple enough? Can/Should I short it down to a one-liner? – Kobi May 01 '11 at 09:36
  • @Kobi: Just do `while count:` instead of `while count != 0:`. – John Machin May 01 '11 at 11:30
  • @Kobi: It can be shortened a little further by hoisting the `.subn` out of the loop. – John Machin May 01 '11 at 11:35
  • @John - Got it with `while count:` - http://ideone.com/WicDK , thanks! But how can `subn` be out of the loop? I tried it in the condition, but didn't get far... – Kobi May 01 '11 at 11:49
  • 1
    @Kobi: Do `xsubn = re.compile(yadda_yadda).subn` then in the loop do `s, count = xsubn('', s)` – John Machin May 01 '11 at 12:27
5

You can just use string manipulation without regular expression

>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']

I leave it to you to join the strings up.

ghostdog74
  • 327,991
  • 56
  • 259
  • 343
4
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Daniel Kluev
  • 11,025
  • 2
  • 36
  • 36
  • that works. But can you please explain the ezpression re.compile("""\([^\)]*\)""").sub('', s) – Jay May 01 '11 at 05:19
  • That regex matches an opening parenthesis, any number of characters that aren’t parentheses, and then a close parenthesis. – Lawrence Velázquez May 01 '11 at 05:24
  • YEs. But that doesn't work for strings with nested brackets. Like this: AX(p>q)&E((-p)Ur) – Jay May 01 '11 at 05:33
  • @Jay If you have arbitrary depth nesting, you probably should reconsider using regexes and go with something like @ghostdog74 suggested, but regarding nesting. – Daniel Kluev May 01 '11 at 10:21
  • This doesn't work, you just changed the input string. – xjcl Jun 01 '20 at 12:45
3

Yeah, it should be:

>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
arussell84
  • 2,443
  • 17
  • 18
2

Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.

It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)

Achim
  • 15,415
  • 15
  • 80
  • 144
  • 2
    This is true for Python's regex engine; however, other implementations like .NET or recent Perl/PHP versions do support recursive regexes. – Tim Pietzcker May 01 '11 at 08:30
  • Can you please provide a pointer to some documentation and/or code sample? Would be very interesting to me! – Achim May 01 '11 at 08:34
  • Try something like this: `\((?:[\w\s]++|(?R))*\)` - http://regexr.com?2tln2 . I'm sure you can find the data if you look for it though. – Kobi May 01 '11 at 08:48
  • Next, the claim that every extra level is extra ugly isn't very correct either. Using the recursive pattern as a base, 0 levels (base): `\([\w\s]*+\)`. 1 Level: `\((?:[\w\s]++|\([\w\s]*+\))*\)`, 2 Levels: `\((?:[\w\s]++|\((?:[\w\s]++|\([\w\s]*+\))*\))*\)` . Every extra level is a *little* more complex (linearly), and you can easily build that string automatically for any given level of nesting. (you basically paste the pattern instead of `(?R)`. – Kobi May 01 '11 at 08:58
2

You can use PyParsing to parse the string:

from pyparsing import nestedExpr
import sys

s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)

Most code is from: How can a recursive regexp be implemented in python?

Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • The last two lines are from somewone who doesn't know much Python but can use Google. I'm pretty sure the `print` is right... – Kobi May 01 '11 at 07:29
  • 2
    a more condensed and clean way to replace the last two lines is this: `print ''.join(item for item in result if isinstance(item, str))` – Gabi Purcaru May 01 '11 at 07:51
  • Dear downvoter - It's very possible there's an error here somewhere - I don't have Python here and didn't check the *whole* code, just a part on IDEONE, and gave reference to the rest. I don't mind the -2, but I'd appreciate a correction! – Kobi May 01 '11 at 12:13
1

You could use re.subn():

import re

s = 'AX(p>q)&E((-p)Ur)'
while True:
    s, n = re.subn(r'\([^)(]*\)', '', s)
    if n == 0:
        break
print(s)

Output

AX&E
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Already suggested in my comment on one of @Kobi's answers over 2.5 hours earlier. – John Machin May 01 '11 at 12:23
  • @John Machin: I know. I saw it after posting the answer. Initially I've encountered that code 4 years ago http://www.velocityreviews.com/forums/t398464-match-nested-parenthesis.html – jfs May 01 '11 at 12:37
0

this is just how you do it:

# strings
# double and single quotes use in Python
"hey there! welcome to CIP"   
'hey there! welcome to CIP'  
"you'll understand python"          
'i said, "python is awesome!"'      
'i can\'t live without python'      
# use of 'r' before string
print(r"\new code", "\n")    

first = "code in"
last = "python"
first + last     #concatenation

# slicing of strings

user = "code in python!"

print(user)
print(user[5])   # print an element 
print(user[-3])  # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])  
print(user[2:])
print(len(user))   # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))

input()
noname
  • 1