DISCLAIMER, this post 'works' but should NEVER be used
So first of all, as I commented earlier, regex isn't meant to be recursive, you may need to make use of a module like pyparsing if you want to solve this cleanly.
If you still desperately want to shoot yourself in the foot and use regex for something it wasn't intended to do, you can make use of the regex
module. A technique Casimir beautifully explained with fully working recursive regex. I wouldn't recommend doing it this way, but I can't judge your current position.
But hey, why shoot yourself in the foot when you can take the entire leg with it? By only using the built-in re
module of course :D So without further delays, here's to making an unmaintainable mess and keeping your job indefinitely until they fully rewrite whatever you're making:
import re
n = 25 # level of nesting allowed, must be specified due to python regex not being recursive
parensre = r"\([^()]*" + r"(?:\([^()]*" * n + r"[^()]*\))?" * n + r"[^()]*\)"
robertre = re.compile(r"Robert\s*" + parensre, re.M | re.S)
johnre = re.compile(r"John\s*" + parensre, re.M | re.S)
tests = """
Robert (Iwant(to(**doRegexMyWay(hithere) * 8) / 3) + 1) ; John (whatever())
John(I dont want to anymore())
"""
print robertre.findall(tests) # outputs ['Robert (Iwant(to(**doRegexMyWay(hithere) * 8) / 3) + 1)']
print johnre.findall(tests) # outputs ['John (whatever())', 'John(I dont want to anymore())']
You can of course mix and combine the parts, with parensre
being the cornerstone brick of your already collapsing sandcastle. The trick is to create n (defaulting to 25) non-capturing groups, all nested inside each other. With a single group being structured like (
non-brackets capturing-group non-brackets )
A taste of the regex it generates:
\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*(?:\([^()]*[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\))?[^()]*\)
TL;DR please don't ever try to do this with re