Use a custom replacement function:
re.sub(pattern, repl, string, count=0, flags=0)
...
If repl
is a function, it is called for every non-overlapping occurrence of pattern
.
The function repl
is called for every occurrence of a single ;
and for parenthesized expressions. Since re.sub
does not find overlapping sequences, the very first opening parenthesis will trigger a full match all the way up to the last closing parenthesis.
import re
def repl(m):
contents = m.group(1)
if '(' in contents:
return contents
return ';\n'
str1 = 'for (j=0; j<len; j++) a = (s) + (4); test = 5;'
str2 = 'for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;'
print (re.sub (r'(;\s*|\(.*\))', repl, str1))
print (re.sub (r'(;\s*|\(.*\))', repl, str2))
Result:
for (j=0; j<len; j++) a = (s) + (4);
test = 5;
for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;
Mission accomplished, for your (very little) sample data.
But wait!
A small – but valid – change in one of the examples
str1 = 'for (j=0; j<len; j++) test = 5; a = (s) + (4);'
breaks this with a wrong output:
for (j=0; j<len; j++) test = 5; a = (s) + (4);
There is no way around it, you need a state machine instead:
def state_match (text):
parentheses = 0
drop_space = False
result = ''
for character in text:
if character == '(':
parentheses += 1
result += '('
elif character == ')':
parentheses -= 1
result += ')'
elif character == ' ':
if not drop_space:
result += ' '
drop_space = False
elif character == ';':
if parentheses:
result += character
else:
result += ';\n'
drop_space = True
else:
result += character
return result
str1 = 'for (j=0; j<len; j++) a = (s) + (4); test = 5;'
str2 = 'for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;'
str3 = 'for (j=0; j<len; j++) test = 5; a = (s) + (4);'
print (state_match(str1))
print (state_match(str2))
print (state_match(str3))
results correctly in:
for (j=0; j<len; j++) a = (s) + (4);
test = 5;
for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;
for (j=0; j<len; j++) test = 5;
a = (s) + (4);