Python - Return all substrings in the first group of nested parentheses

Question

I want to find an efficient way to select all the sub-strings contained in the first group of nested parentheses from a string.

For example:

input: a d f gsds ( adsd ) adsdaa    
output: ( adsd )

input: adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad ) 
output: ( sadad adsads ( adsda ) dsadsa )

intput: a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )
output: ( adad ( sad ) sdada asdad )

Notice there could be multiple groups of nested parentheses.

One solution would be scanning the string char by char and keeping track of the number of opened parentheses until (decreasing the number, once we have a closing parenthesis) the counter becomes 0 again.

I am wondering if there is a simpler way to do it? Maybe with regular expressions?

Thanks

You should avoid using `regex`, it could get complicated and error prone. The approach you suggested is your safest bet — sshashank124, Apr 22 '15 at 23:03
Check this out: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns — sshashank124, Apr 22 '15 at 23:12
sshashank124's link's tl;dr summary is that you can't use regexes to match arbitrarily deep nested parentheses. Assuming you care about parentheses pairing correctness, of course. — bmhkim, Apr 22 '15 at 23:36

score 2 · Accepted Answer · answered Apr 23 '15 at 00:28

I wrote a little function:

def parens(s):
    i=s[s.find('('):s.find(')')].count('(')   #counts number of '(' until the first ')'
    groups = s[s.find('('):].split(')')       #splits the string at every ')'
    print ')'.join(groups[:i]) +')'           #joins the list with ')' using the number of counted '('

Demo:

>>> parens('a d f gsds ( adsd ) adsdaa')
( adsd )

>>> parens('adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )')
( sadad adsads ( adsda ) dsadsa )

>>> parens('a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )')
( adad ( sad ) sdada asdad )

score 1 · Answer 2 · answered Feb 05 '22 at 09:40

You can use pyparsing to select all the sub-strings contained in the first group of nested parentheses from a string.

import pyparsing as pp

pattern = pp.Regex(r'.*?(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))

txt = 'a d f gsds ( adsd ) adsdaa'
result = pattern.parse_string(txt)[1]
assert result == '( adsd )'

txt = 'adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )'
result = pattern.parse_string(txt)[1]
assert result == '( sadad adsads ( adsda ) dsadsa )'

txt = 'a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )'
result = pattern.parse_string(txt)[1]
assert result == '( adad ( sad ) sdada asdad )'

* pyparsing can be installed by pip install pyparsing

Note:

If a pair of parentheses gets broken inside () (for example a(b(c), a(b)c), etc), an unexpected result is obtained or IndexError is raised. So be careful. (See: Python extract string in a phrase)

Python - Return all substrings in the first group of nested parentheses

2 Answers2

Note:

Linked