As said in the comments, it's impossible to process that using regex because of parenthesis nesting.
An alternative would be some good old string processing with nesting count on parentheses:
def parenthesis_split(sentence,separator=" ",lparen="(",rparen=")"):
nb_brackets=0
sentence = sentence.strip(separator) # get rid of leading/trailing seps
l=[0]
for i,c in enumerate(sentence):
if c==lparen:
nb_brackets+=1
elif c==rparen:
nb_brackets-=1
elif c==separator and nb_brackets==0:
l.append(i)
# handle malformed string
if nb_brackets<0:
raise Exception("Syntax error")
l.append(len(sentence))
# handle missing closing parentheses
if nb_brackets>0:
raise Exception("Syntax error")
return([sentence[i:j].strip(separator) for i,j in zip(l,l[1:])])
print(parenthesis_split("blah (blah2 (blah3))|blah4 blah5"))
result:
['blah', '(blah2 (blah3))|blah4', 'blah5']
l
contains the indexes of the string where a non-paren protected space occurs. In the end, generate the array by slicing the list.
note the strip()
in the end to handle multiple separator occurrences, and at the start to remove leading/trailing separators which would create empty items in the returned list.