How do I split a string into groups when but only when the parens are balanced?
For example, "(Small Business (SB), Women-Owned Small Business (WOSB)), (8(a))"
into ["(Small Business (SB), Women-Owned Small Business (WOSB))", "(8(a))"]
?
How do I split a string into groups when but only when the parens are balanced?
For example, "(Small Business (SB), Women-Owned Small Business (WOSB)), (8(a))"
into ["(Small Business (SB), Women-Owned Small Business (WOSB))", "(8(a))"]
?
These are really hard (impossible?) to do with regex, so maybe just write a little loop, something like:
def split(s):
start = 0
nest = 0
for i, char in enumerate(s):
if char == "(":
nest += 1
elif char == ")":
nest -= 1
elif char == "," and nest == 0:
yield s[start:i].strip()
start = i + 1
yield s[start:].strip()
list(split(s))
['(Small Business (SB), Women-Owned Small Business (WOSB))', '(8(a))']
Similar to wim's, but using itertools.groupby
:
from itertools import groupby
def split(s):
nest = 0
def splitter(c):
nonlocal nest
if c == ',':
return nest == 0
if c == '(':
nest += 1
elif c == ')':
nest -= 1
return False
return [''.join(g).strip()
for k, g in groupby(s, splitter)
if not k]
s = "(Small Business (SB), Women-Owned Small Business (WOSB)), (8(a))"
print(split(s))
Output:
['(Small Business (SB), Women-Owned Small Business (WOSB))', '(8(a))']