3

text:

text1 = 'xx(aa)(bb)xx'
text2 = 'xx(aa(bb))xx'

expectation:

('aa', 'bb')  
('aa(bb)',  'bb')

My approach, but it does not meet expectations.

re.compile(r'\(\s?(.+?)\s?\)')
Scheinin
  • 195
  • 9

1 Answers1

5

You can install the PyPi regex module and use

import regex

texts = ['xx(aa)(bb)xx', 'xx(aa(bb))xx']
rx = r'\(((?:[^()]++|(?R))*)\)'

for text in texts:
    print(regex.findall(rx, text, overlapped=True))

See the Python demo. Output:

['aa', 'bb']
['aa(bb)', 'bb']

The \(((?:[^()]++|(?R))*)\) regex is a common PCRE compliant regex that matches strings between nested paired parentheses, I added a capturing group for contents in between the brackets.

To get all overlapping parentheses, the overlapped=True option is passed to regex.findall.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563