Split string at comma but only if parens are balanced?

Question

How do I split a string into groups when but only when the parens are balanced?

For example, "(Small Business (SB), Women-Owned Small Business (WOSB)), (8(a))" into ["(Small Business (SB), Women-Owned Small Business (WOSB))", "(8(a))"]?

Use the parser corresponding to the formatter that produced the string? — Kelly Bundy, Mar 17 '22 at 19:56

score 1 · Accepted Answer · edited Mar 17 '22 at 21:59

1

These are really hard (impossible?) to do with regex, so maybe just write a little loop, something like:

def split(s):
    start = 0
    nest = 0
    for i, char in enumerate(s):
        if char == "(":
            nest += 1
        elif char == ")":
             nest -= 1
        elif char == "," and nest == 0:
            yield s[start:i].strip()
            start = i + 1
    yield s[start:].strip()

list(split(s))
['(Small Business (SB), Women-Owned Small Business (WOSB))', '(8(a))']

edited Mar 17 '22 at 21:59

spitfiredd

2,897
5
32
75

answered Mar 17 '22 at 20:00

wim

338,267
99
616
750

You can't do this with regular expressions for arbitrary strings -- a regular expression isn't powerful enough to determine whether a string contains balanced parentheses. – BrokenBenchmark Mar 18 '22 at 02:49
@BrokenBenchmark According to [this](https://stackoverflow.com/q/546433/674039) it looks like it may be possible, but pretty difficult (and may require a more powerful regex engine than Python's stdlib one). – wim Mar 18 '22 at 03:48
That's largely because some languages have regex engines that allow features that regular expressions (as formally defined) [don't normally have](https://en.wikipedia.org/wiki/Regular_language). For example, the first regular expression in the question you've linked uses a depth counter. – BrokenBenchmark Mar 18 '22 at 03:54

score 1 · Answer 2 · answered Mar 17 '22 at 20:23

Similar to wim's, but using itertools.groupby:

from itertools import groupby

def split(s):
    nest = 0
    def splitter(c):
        nonlocal nest
        if c == ',':
            return nest == 0
        if c == '(':
            nest += 1
        elif c == ')':
            nest -= 1
        return False
    return [''.join(g).strip()
            for k, g in groupby(s, splitter)
            if not k]

s = "(Small Business (SB), Women-Owned Small Business (WOSB)), (8(a))"
print(split(s))

Output:

['(Small Business (SB), Women-Owned Small Business (WOSB))', '(8(a))']

Split string at comma but only if parens are balanced?

2 Answers2