Let's work with your example:
how to construct a regex to flag all non-alphabetic (say [^a-z])
characters except when they occur within parantheses
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can look at two options, depending on whether or not parentheses can be nested.
Option 1: No Nesting
We can use this simple regex:
\([^)]*\)|([^a-z()]+)
The left side of the alternation |
matches complete (parentheses)
. We will ignore these matches. The right side matches and captures the offending characters to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
This program shows how to use the regex (see the results at the bottom of the online demo):
import re
subject = '[]{}&&& ThisIs(OK)'
regex = re.compile(r'\([^)]*\)|([^a-z()]+)')
# put Group 1 captures in a list
matches = [group for group in re.findall(regex, subject) if group]
print("\n" + "*** Matches ***")
if len(matches)>0:
for match in matches:
print (match)
Option 2: Nested Parentheses
If for any reason parentheses can be nested, use Matthew Barnett's regex
module for Python, substituting this recursive regex on the left side of the |
to match the parentheses: \((?:[^()]++|(?R))*\)
. The overall regex therefore becomes:
\((?:[^()]++|(?R))*\)|([^a-z()]+)
Reference