15

I'd like to use pyparsing to parse an expression of the form: expr = '(gimme [some {nested [lists]}])', and get back a python list of the form: [[['gimme', ['some', ['nested', ['lists']]]]]]. Right now my grammar looks like this:

nestedParens = nestedExpr('(', ')')
nestedBrackets = nestedExpr('[', ']')
nestedCurlies = nestedExpr('{', '}')
enclosed = nestedParens | nestedBrackets | nestedCurlies

Presently, enclosed.searchString(expr) returns a list of the form: [[['gimme', ['some', '{nested', '[lists]}']]]]. This is not what I want because it's not recognizing the square or curly brackets, but I don't know why.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Derek
  • 165
  • 2
  • 6

2 Answers2

28

Here's a pyparsing solution that uses a self-modifying grammar to dynamically match the correct closing brace character.

from pyparsing import *

data = '(gimme [some {nested, nested [lists]}])'

opening = oneOf("( { [")
nonBracePrintables = ''.join(c for c in printables if c not in '(){}[]')
closingFor = dict(zip("({[",")}]"))
closing = Forward()
# initialize closing with an expression
closing << NoMatch()
closingStack = []
def pushClosing(t):
    closingStack.append(closing.expr)
    closing << Literal( closingFor[t[0]] )
def popClosing():
    closing << closingStack.pop()
opening.setParseAction(pushClosing)
closing.setParseAction(popClosing)

matchedNesting = nestedExpr( opening, closing, Word(alphas) | Word(nonBracePrintables) )

print matchedNesting.parseString(data).asList()

prints:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

Updated: I posted the above solution because I had actually written it over a year ago as an experiment. I just took a closer look at your original post, and it made me think of the recursive type definition created by the operatorPrecedence method, and so I redid this solution, using your original approach - much simpler to follow! (might have a left-recursion issue with the right input data though, not thoroughly tested):

from pyparsing import *

enclosed = Forward()
nestedParens = nestedExpr('(', ')', content=enclosed) 
nestedBrackets = nestedExpr('[', ']', content=enclosed) 
nestedCurlies = nestedExpr('{', '}', content=enclosed) 
enclosed << (Word(alphas) | ',' | nestedParens | nestedBrackets | nestedCurlies)


data = '(gimme [some {nested, nested [lists]}])' 

print enclosed.parseString(data).asList()

Gives:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

EDITED: Here is a diagram of the updated parser, using the railroad diagramming support coming in pyparsing 3.0. railroad diagram

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • Paul, thank you so much for the informative answer. And thank you even more for creating and open sourcing my new favorite python library! pyparsing is helping me dramatically reduce the size, complexity, and maintainability of a project I've working on. – Derek Jan 26 '11 at 07:09
  • 1
    If anyone is confused by the << operator used in the updated example, see the documentation of the pyparsing Forward class: https://pythonhosted.org/pyparsing/pyparsing.Forward-class.html – skelliam Sep 03 '20 at 13:57
-3

This should do the trick for you. I tested it on your example:

import re
import ast

def parse(s):
    s = re.sub("[\{\(\[]", '[', s)
    s = re.sub("[\}\)\]]", ']', s)
    answer = ''
    for i,char in enumerate(s):
        if char == '[':
            answer += char + "'"
        elif char == '[':
            answer += "'" + char + "'"
        elif char == ']':
            answer += char
        else:
            answer += char
            if s[i+1] in '[]':
                answer += "', "
    ast.literal_eval("s=%s" %answer)
    return s

Comment if you need more

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 1
    Apologies for not being clear enough, but the output I was referring to is a nested python list, which is a common result of parsing nested expressions with pyparsing. Your solution just returns a string that looks like a printed python list. Thanks for your help though! – Derek Jan 26 '11 at 05:20
  • @Derek: I'm not returning a string. I'm returning a list. The variable named answer is a string, yes; but that's why there is that line that says exec"s=%s" %answer. This creates a new variable called s, which is a list. This is why my code returns s and not answer. You should check the type of the returned value, and you'll see that it's a list, not a string – inspectorG4dget Jan 26 '11 at 05:39
  • 3
    you are returning a list, but I think you've misunderstood what parsing is in this context. When you parse a string, you typically have access to the matched tokens/groups at parse time, allowing you to perform some action on them. Your program just dynamically generates python code and execs it to transform a string into a nested list. It doesn't parse anything, nor does it use pyparsing as mentioned in the original question. Not to mention it will exec arbitrary python code, so it would fail on inputs with quotes, for example. – Derek Jan 26 '11 at 06:47
  • 6
    All other criticisms aside, you shouldn't be using `exec` like that. At most, you should use `ast.literal_eval`. – jpmc26 Dec 05 '14 at 02:17
  • 1
    Dangerous use of exec -- data could run code to delete files on disk, upload sensitive information, etc. – Michael Scott Asato Cuthbert Feb 01 '16 at 14:08