1

I have a string like that: 'aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'. From that string I want to get the following structure: ['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']. In other words I would like to separate substrings that are in brackets of different types.

Roman
  • 124,451
  • 167
  • 349
  • 456
  • Why did your `(cc(kkk)c)` structure change to `(cc{kkk}c)`? – Martijn Pieters Mar 07 '13 at 10:38
  • Sorry, I made a mistake. it should be (cc(kkk)c). – Roman Mar 07 '13 at 10:39
  • Right, so your parenthesis and brackets can be nested. – Martijn Pieters Mar 07 '13 at 10:39
  • Similar question: [Python parsing bracketed blocks](http://stackoverflow.com/q/1651487/222914) – Janne Karila Mar 07 '13 at 10:48
  • @JanneKarila: That question only has to deal with *one* type of bracket. When you have more than one a stack approach is the only way to deal with matching openers with closers properly. – Martijn Pieters Mar 07 '13 at 11:29
  • @MartijnPieters OP did not specify how to deal with mismatched parenthesis. I agree with your answer; it is the sane approach, and it requires a stack, but the requested output from the example could be produced in a simpler way too. – Janne Karila Mar 07 '13 at 11:47
  • @JanneKarila: Sure, that specific example can be produced with a simple counter, but it's a sure bet the OP will come back with a 'but what about the `aa{bb)cc` case where the brackets do not match?' example. :-) – Martijn Pieters Mar 07 '13 at 11:50

3 Answers3

7

You need to use a stack approach to track nesting levels:

pairs = {'{': '}', '[': ']', '(': ')'}

def parse_groups(string):
    stack = []
    last = 0
    for i, c in enumerate(string):
        if c in pairs:
            # push onto the stack when we find an opener
            if not stack and last < i:
                # yield anything *not* grouped
                yield string[last:i]
            stack.append((c, i))
        elif c in pairs:
            if stack and pairs[stack[-1][0]] == c:
                # Found a closing bracket, pop the stack
                start = stack.pop()[1]
                if not stack:
                    # Group fully closed, yield
                    yield string[start:i + 1]
                    last = i + 1
            else:
                raise ValueError('Missing opening parethesis')

    if stack:
        raise ValueError('Missing closing parethesis')

    if last < len(string):
        # yield the tail
        yield string[last:]

This will generate groups, cast to a list if you need one:

>>> list(parse_groups('aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'))
['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']

If the brackets / parenthesis do not balance, an exception is raised:

>>> list(parse_groups('aa(bb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 19, in parse_groups
ValueError: Missing closing parethesis
>>> list(parse_groups('aa[{bb}}'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 20, in parse_groups
ValueError: Missing opening parethesis
>>> list(parse_groups('aa)bb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 20, in parse_groups
ValueError: Missing opening parethesis
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

You could also look at pyparsing. Interestingly, this can be implemented as a stack, where you can push string fragments when you find {[( and pop when you find )]}.

Supreet Sethi
  • 1,780
  • 14
  • 24
0

I think you could try Custom String Parser library (I'm the author of it). It's designed to work with data which has any logical structure, so you can customize it the way you want ;)

Lukas Šalkauskas
  • 14,191
  • 20
  • 61
  • 77