-1

I want to splitting a string such as Si(C3(COOH)2)4(H2O)7 into the following

[Si, (C3(COOH)2), 4, (H2O), 7]

That is, entire paranthesis expressions turn into an element by themselves. I've tried a number of different combinations with re.findall() to no avail. Any help is greatly appreciated.

Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52
user1036197
  • 411
  • 1
  • 4
  • 7

1 Answers1

0

You have to scan the string yourself, keeping track of the nesting depth. The significant 'events' are 'at beginning of string', 'at (', 'at )', and 'at end of string'. At each event, consider depth and reset it.

inn = 'Si(C3(COOH)2)4(H2O)7'
out = ['Si', '(C3(COOH)2)', '4', '(H2O)', '7']
res = []
beg = 0
dep = 0
for i, c in enumerate(inn):
    if c == '(':
        if dep == 0 and beg < i:
            res.append(inn[beg:i])
            beg = i
        dep += 1
    elif c == ')':
        if dep == 0:
            raise ValueError("')' without prior '('")
        elif dep == 1:
            res.append(inn[beg:i+1])
            beg = i+1
        dep -= 1
if dep == 0:
    res.append(inn[beg:i+1])
else:
    raise ValueError("'(' without following ')'")
print(res, res == out)

# prints
# ['Si', '(C3(COOH)2)', '4', '(H2O)', '7'] True
Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52