2

I want to find an efficient way to select all the sub-strings contained in the first group of nested parentheses from a string.

For example:

input: a d f gsds ( adsd ) adsdaa    
output: ( adsd )

input: adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad ) 
output: ( sadad adsads ( adsda ) dsadsa )

intput: a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )
output: ( adad ( sad ) sdada asdad )

Notice there could be multiple groups of nested parentheses.

One solution would be scanning the string char by char and keeping track of the number of opened parentheses until (decreasing the number, once we have a closing parenthesis) the counter becomes 0 again.

I am wondering if there is a simpler way to do it? Maybe with regular expressions?

Thanks

logic
  • 1,739
  • 3
  • 16
  • 22
Giuseppe
  • 447
  • 2
  • 5
  • 14
  • 2
    You should avoid using `regex`, it could get complicated and error prone. The approach you suggested is your safest bet – sshashank124 Apr 22 '15 at 23:03
  • 2
    Check this out: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns – sshashank124 Apr 22 '15 at 23:12
  • sshashank124's link's tl;dr summary is that you can't use regexes to match arbitrarily deep nested parentheses. Assuming you care about parentheses pairing correctness, of course. – bmhkim Apr 22 '15 at 23:36

2 Answers2

2

I wrote a little function:

def parens(s):
    i=s[s.find('('):s.find(')')].count('(')   #counts number of '(' until the first ')'
    groups = s[s.find('('):].split(')')       #splits the string at every ')'
    print ')'.join(groups[:i]) +')'           #joins the list with ')' using the number of counted '('

Demo:

>>> parens('a d f gsds ( adsd ) adsdaa')
( adsd )

>>> parens('adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )')
( sadad adsads ( adsda ) dsadsa )

>>> parens('a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )')
( adad ( sad ) sdada asdad )
logic
  • 1,739
  • 3
  • 16
  • 22
1

You can use pyparsing to select all the sub-strings contained in the first group of nested parentheses from a string.

import pyparsing as pp

pattern = pp.Regex(r'.*?(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))

txt = 'a d f gsds ( adsd ) adsdaa'
result = pattern.parse_string(txt)[1]
assert result == '( adsd )'

txt = 'adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )'
result = pattern.parse_string(txt)[1]
assert result == '( sadad adsads ( adsda ) dsadsa )'

txt = 'a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )'
result = pattern.parse_string(txt)[1]
assert result == '( adad ( sad ) sdada asdad )'

* pyparsing can be installed by pip install pyparsing

Note:

If a pair of parentheses gets broken inside () (for example a(b(c), a(b)c), etc), an unexpected result is obtained or IndexError is raised. So be careful. (See: Python extract string in a phrase)

quasi-human
  • 1,898
  • 1
  • 2
  • 13