You can solve this problem using the Scanner
of the re
module:
Using following list of strings as test:
l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
'something{now I am wrapped} here {and there} listen',
'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']
Create a class where I will keep state of the number of opened and closed curly braces, besides of the text between both edges of them. It has three methods, one when matches an opening curly braces, other for the closing one, and the last one for the text between both. Depends if the stack (opened_cb
variable) is empty, I do different actions:
class Cb():
def __init__(self, results=None):
self.results = []
self.opened_cb = 0
def s_text_until_cb(self, scanner, token):
if self.opened_cb == 0:
return token
else:
self.results.append(token)
return None
def s_opening_cb(self, scanner, token):
self.opened_cb += 1
if self.opened_cb == 1:
return token
self.results.append(token)
return None
def s_closing_cb(self, scanner, token):
self.opened_cb -= 1
if self.opened_cb == 0:
t = [''.join(self.results), token]
self.results.clear()
return t
else:
self.results.append(token)
return None
And last, I create the Scanner
and join the results in a plain list:
for s in l:
results = []
cb = Cb()
scanner = re.Scanner([
(r'[^{}]+', cb.s_text_until_cb),
(r'[{]', cb.s_opening_cb),
(r'[}]', cb.s_closing_cb),
])
r = scanner.scan(s)[0]
for elem in r:
if isinstance(elem, list):
results.extend(elem)
else:
results.append(elem)
print('Original string --> {0}\nResult --> {1}\n\n'.format(s, results))
Here the complete program and an execution to see the results:
import re
l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
'something{now I am wrapped} here {and there} listen',
'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']
class Cb():
def __init__(self, results=None):
self.results = []
self.opened_cb = 0
def s_text_until_cb(self, scanner, token):
if self.opened_cb == 0:
return token
else:
self.results.append(token)
return None
def s_opening_cb(self, scanner, token):
self.opened_cb += 1
if self.opened_cb == 1:
return token
return None
def s_closing_cb(self, scanner, token):
self.opened_cb -= 1
if self.opened_cb == 0:
t = [''.join(self.results), token]
self.results.clear()
return t
else:
self.results.append(token)
return None
for s in l:
results = []
cb = Cb()
scanner = re.Scanner([
(r'[^{}]+', cb.s_text_until_cb),
(r'[{]', cb.s_opening_cb),
(r'[}]', cb.s_closing_cb),
])
r = scanner.scan(s)[0]
for elem in r:
if isinstance(elem, list):
results.extend(elem)
else:
results.append(elem)
print('Original string --> {0}\nResult --> {1}\n\n'.format(s, results))
Run it like:
python3 script.py
That yields:
Original string --> something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else
Result --> ['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'everything else']
Original string --> something{now I am wrapped} here {and there} listen
Result --> ['something', '{', 'now I am wrapped', '}', ' here ', '{', 'and there', '}', ' listen']
Original string --> something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now I am wrapped {I should {not} cause splitting} I am still wrapped', '}', 'everything']
Original string --> something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped', '}', 'everything']