0

I am trying to extract all words within nested parentheses by using regex. Here is an example of my .txt file:

hello ((

(alpha123_4rf)
45beta_Frank))
Red5Great_Sam_Fun

I have tried this with regex:

r'[\((?\(??(^\()?\))]'

but have not been able to get the desired output. I want my output to be like this:

((

(alpha123_4rf)
 45beta_Frank))

What am I doing wrong? Any help is greatly appreciated!

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • 1
    if you want any number of nested parentheses, you'll need subexpression call, but that is not supported by `re` module.. you can use third-party `regex` module though – Sundeep Nov 16 '19 at 08:41
  • @Sundeep hm...what is a third-party `regex` ? sorry, as I am on a newbie level in python –  Nov 16 '19 at 08:43
  • 2
    see https://pypi.org/project/regex/ – Sundeep Nov 16 '19 at 08:44
  • 1
    Also see https://stackoverflow.com/a/12280660/5527985 – bobble bubble Nov 16 '19 at 10:26
  • 1
    yeah, there's a solution in the linked question above with `regex` module.. `regex.findall(r'\((?:[^()]++|(?0))++\)', s)` will work for any level of nesting – Sundeep Nov 17 '19 at 09:27
  • An alternative version using `pyparsing` is here: https://stackoverflow.com/questions/29810464/python-return-all-substrings-in-the-first-group-of-nested-parentheses/70996791#70996791 – quasi-human Feb 05 '22 at 09:50

2 Answers2

0

Try this pattern (?s)\([^(]*\((.+)\)[^)]*\)

Explanation:

(?s) - flag: single line mode - . matches also newline character

\( - match ( literally

[^(]* - match zero or more characters other from (

\( - match ( literally

(.+) - match one or mroe of any characters and store it inside first capturing group

\) - match ) literally

[^)]* - match zero or more characters other from )

\) - match ) literally

Demo

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • Thanks for the detailed explanation Michal. It works! –  Nov 16 '19 at 09:26
  • hey Michal, what if my nested brackets are inconsistent? Lets say I have another that is like this (Select xxxxx, xxxx(xx) as xxxxxx, xxxx(xyzz) as kkk, cas((hello((cas(xxx as abc 'yxys')) as afkk from kjdkj and hsx gkgkg = rate'98209''; ); quit; –  Nov 18 '19 at 02:02
0

If the parantheses are directly following each other, this simpler solution would also do it:

def find_brackets(text):
    rx = "(?s)\(\((.+)\)\)"
    z = re.search(rx,text)
    if z: 
        return  z[0] 
    else: 
        return ''
Thomas Bobek
  • 126
  • 1
  • 8