I have string in the form 'AB(AB(DDC)C)A(BAAC)DAB(ABC)'
.
- Each character represents an element (
A
,B
,C
orD
). - Between parentheses, on the right, there is the child of each element (which may be absent).
In example, having 'AB(AB(DDC)C)A(BAAC)DA'
, the top level would be AB(AB(DDC)C)A(BAAC)DA --> [A, B, A, D, A]
and the corresponding children would be [None, AB(DDC)C, BAAC, None, None]
. The children are to be parsed as well recursively.
I have implemented a solution here:
def parse_string(string):
i = 0
parsed = []
while i < len(string):
if string[i] in ('A', 'B', 'C', 'D'):
parsed.append([string[i], None])
i += 1
elif string[i] == '(':
open_brakets = 1
i += 1
j = i
while open_brakets:
if string[j] == '(':
open_brakets += 1
elif string[j] == ')':
open_brakets -= 1
j += 1
# Parse the children as well
parsed[-1][-1] = parse_string(string[i:j - 1])
i = j
else:
i += 1
return parsed
print parse_string('AB(AB(DDC)C)A(BAAC)DAB(ABC)')
Although I think it's a bit ugly and I'm sure it is not very efficient.
I wonder if there's a way to make this with Python in a cleaner/faster/more elegant way? Using external libraries is allowed (specially if they're written in C! :-P).
Update
Other examples of strings that should work:
ABC(DAB(ACB)BBB(AAA)ABC)DCB
In general, the length of the string is not limited, neither the number of children, nor their length, nor the number of nested levels.