12

I want a Python function that takes a string, and returns an array, where each item in the array is either a character, or another array of this kind. Nested arrays are marked in the input string by starting with '(' and ending with ')'.

Thus, the function would act like this:

1) foo("abc") == ["a", "b", "c"]
2) foo("a(b)c") == ["a", ["b"], "c"]
3) foo("a(b(c))") == ["a", ["b", ["c"]]]
4) foo("a(b(c)") == error: closing bracket is missing
5) foo("a(b))c") == error: opening bracket is missing
6) foo("a)b(c") == error: opening bracket is missing

Note: I'd prefer a solution that's purely functional.

Tespa42
  • 567
  • 4
  • 12
  • Use recursion here, it's a perfect fit. Finding a '(' in the token stream means recurse-into. Finding a ')' at the top level invocation means that there is a balance mismatch. – user2246674 Jun 17 '13 at 05:22
  • Recursion indeed. As long as the string isn't too long, that is. Please also bear in mind that this might need a wrapper funstion, since we would need to check whether the number of '(' matches the number of ')'. If so, call the recursive function. If not, give appropriate error message. – Truerror Jun 17 '13 at 05:41
  • Yes, I know that I'll use recursion here. I know how to write a function (that uses a stack) that returns whether the brackets in a string are balanced and well-ordered, but it's taking the next step and trying to return nested arrays that stumps me. This isn't homework. This is part of a bigger parsing program. – Tespa42 Jun 17 '13 at 06:47

7 Answers7

8
def foo(s):
    def foo_helper(level=0):
        try:
            token = next(tokens)
        except StopIteration:
            if level != 0:
                raise Exception('missing closing paren')
            else:
                return []
        if token == ')':
            if level == 0:
                raise Exception('missing opening paren')
            else:
                return []
        elif token == '(':
            return [foo_helper(level+1)] + foo_helper(level)
        else:
            return [token] + foo_helper(level)
    tokens = iter(s)
    return foo_helper()

And,

>>> foo('a((b(c))d)(e)')
['a', [['b', ['c']], 'd'], ['e']]
Jared
  • 25,627
  • 7
  • 56
  • 61
  • Do you wanna try to rewrite this in a functional style? – Tespa42 Jun 17 '13 at 07:02
  • @FinalZero What do you mean? This is already a recursive solution. – Jared Jun 17 '13 at 07:04
  • I mean in a way that doesn't use `iter` and `next`. – Tespa42 Jun 17 '13 at 07:06
  • The answer isn't purely functional, but I'll accept it anyways, because it helped me conceptualize and write the purely functional solution, which requires `foo` to return a tuple instead of just an array. – Tespa42 Jun 17 '13 at 10:54
8

Iterative.

def foo(xs):
    stack = [[]]
    for x in xs:
        if x == '(':
            stack[-1].append([])
            stack.append(stack[-1][-1])
        elif x == ')':
            stack.pop()
            if not stack:
                return 'error: opening bracket is missing'
                #raise ValueError('error: opening bracket is missing')
        else:
            stack[-1].append(x)
    if len(stack) > 1:
        return 'error: closing bracket is missing'
        #raise ValueError('error: closing bracket is missing')
    return stack.pop()

assert foo("abc") == ["a", "b", "c"]
assert foo("a(b)c") == ["a", ["b"], "c"]
assert foo("a(b(c))") == ["a", ["b", ["c"]]]
assert foo("a((b(c))d)(e)") == ['a', [['b', ['c']], 'd'], ['e']]
assert foo("a(b(c)") == "error: closing bracket is missing"
assert foo("a(b))c") == "error: opening bracket is missing"
assert foo("a)b(c") == 'error: opening bracket is missing'
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • +1 And you don't need `result`: just set `stack = [[]]` and then return `stack.pop()`. Seems more pure that way -- just the stack. – FMc Jun 17 '13 at 07:37
2

I would suggest two ways:

Either program your own recusive descent parser like here, or use pyparsing, something like

import pyparsing as pp
expr = pp.Forward()
expr << pp.Word(pp.alphas) + pp.Optional('(' + expr + ')') + pp.Optional(pp.Word(pp.alphas))

here you describe a recursive expression as a sequence of alphas, which can be interleaved by balanced parentheses. When you check this example for the outputs you will see how to get the desired output structure (though it will require some tweaking on your side and require some learning about pyparsing).

regards markus

fricke
  • 1,330
  • 1
  • 11
  • 21
2

Using regex and ast.literal_eval

>>> import re
>>> from ast import literal_eval
>>> def listit(t):
...         return list(map(listit, t)) if isinstance(t, (list, tuple)) else t
... 
def solve(strs):
    s = re.sub(r'[A-Za-z]', "'\g<0>',", strs)
    s = re.sub(r"\)", "\g<0>,", s)
    try: return listit( literal_eval('[' + s  + ']') )
    except : return "Invalid string! "
...     
>>> solve("abc")
['a', 'b', 'c']
>>> solve("a(b)c")
['a', ['b'], 'c']
>>> solve("a(b(c))")
['a', ['b', ['c']]]
>>> solve("a(b(c)")
'Invalid string! '
>>> solve("a)b(c")
'Invalid string! '
>>> solve("a(b))c")
'Invalid string! '
>>> solve('a((b(c))d)(e)')
['a', [['b', ['c']], 'd'], ['e']]
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
1

One rather quick and nasty approach (just for something different):

import json, re

def foo(x):
    # Split continuous strings
    # Match consecutive characters
    matches = re.findall('[a-z]{2,}', x)
    for m in matches:
        # Join with ","
        x = x.replace(m, '","'.join(y for y in list(m))) 

    # Turn curvy brackets into square brackets
    x = x.replace(')', '"],"')
    x = x.replace('(', '",["')

    # Wrap whole string with square brackets
    x = '["'+x+'"]'

    # Remove empty entries
    x = x.replace('"",', '')
    x = x.replace(',""', '')

    try:
        # Load with JSON
        return json.loads(x)
    except:
        # TODO determine error type
        return "error"

def main():
    print foo("abc")     # ['a', 'b', 'c']
    print foo("a(b)c")   # ['a', ['b'], 'c']
    print foo("a(b(c))") # ['a', ['b', ['c']]]
    print foo("a(b))c")  # error

    print foo('a((b(c))d)(e)') # ['a', [['b', ['c']], 'd'], ['e']]
Alex L
  • 8,748
  • 5
  • 49
  • 75
1
def parse_nested(iterator, level=0):
    result = []
    for c in iterator:
        if c == '(':
            result.append(parse_nested(iterator, level+1))
        elif c == ')':
            if level:
                return result
            else:
                raise ValueError("Opening parenthesis missing")
        else:
            result.append(c)
    if level:
        raise ValueError("Closing parenthesis missing")
    else:
        return result

print parse_nested(iter('a((b(c))d)(e)'))       
Janne Karila
  • 24,266
  • 6
  • 53
  • 94
0

Recursion is something very powerful that you should try to use.

Here is my code:



    # encoding: utf-8
    # Python33

    def check(s):
        cs = [c for c in s if c == '(' or c ==')']
        flag = 0
        for c in cs:
            if flag < 0:
                return 'opening bracket is missing'        
            if c == '(':
                flag += 1
            else:
                flag -= 1
        if flag < 0:
            return 'opening bracket is missing'
        elif flag > 0:
            return 'closing bracket is missing'
        else:
            return ''

    def _foo(cs):
        result = []
        while len(cs):
            c = cs.pop(0)
            if c == '(':
                result.append(_foo(cs))
            elif c == ')':
                return result
            else:
                result.append(c)
        return result

    def foo(s):
        valiad = check(s)
        if valiad:
            return valiad
        cs = list(s)
        return _foo(cs)

    if __name__ == '__main__':
        ss = ["abc","a(b)c","a(b(c))","a(b(c)","a(b))c","a)b(c"]
        for s in ss:
            print(foo(s))

Harry Wang
  • 85
  • 7
  • My code is ok, but error when display in the answer area... I am trying to fix it._______I replaced the '<' whit '<', and it did work! – Harry Wang Jun 17 '13 at 06:26