0

I am parsing a script. I am looking for the most effective method to parse braced blocks, prefixed by a descriptor. The prefixes are arbitrary. I would also like [a, b] to be parsed as a list. More braced blocks would be added as such, supporting recursion. If you cannot provide a definite script, please at least point me in the right direction.

Example:

_data{abc; def; gh i; j k l; mn[op, qr]}}
_main{abc; de f; ji(k, l); if(a==b){pass;};}

I would want it parsed into

{
"_data": ["abc", "def", "gh i", "j k l", ["mn", ["op", "qr"]]],
"_main": ["abc", "de f", ["ji", ["k", "l"]], ["if", "a==b", ["pass"]]]
}
  • Welcome on Stack Overflow. Please read [How to ask good question](https://stackoverflow.com/help/how-to-ask) and show some effort you made to solve the problem. Maybe [this question](https://stackoverflow.com/questions/2945357/python-how-best-to-parse-a-simple-grammar) would provide some insights. – ArturFH Jun 10 '17 at 01:31

1 Answers1

0

Here are a few lines using pyparsing that might get you started:

import pyparsing as pp

# an ident is a character "word" starting with an '_' and followed by alphas
ident = pp.Word('_', pp.alphas, min=2)

# a line is an ident followed by a nested expression using {}'s
line = pp.Group(ident('ident') + pp.nestedExpr('{', '}')("body"))

# cleaned up sample, original post had mismatched {}'s
sample = """\
_data{abc; def; gh i; j k l; mn[op, qr]}
_main{abc; de f; ji(k, l); if(a==b){pass;};}"""

# parse 1 or more lines in the given sample
for parsed in (line*(1,)).parseString(sample):
    parsed.pprint()
    # access the parsed data by name
    print('ident:', parsed.ident)
    print('body:', parsed.body.asList())
    print()

prints:

['_data', ['abc;', 'def;', 'gh', 'i;', 'j', 'k', 'l;', 'mn[op,', 'qr]']]
ident: _data
body: [['abc;', 'def;', 'gh', 'i;', 'j', 'k', 'l;', 'mn[op,', 'qr]']]

['_main', ['abc;', 'de', 'f;', 'ji(k,', 'l);', 'if(a==b)', ['pass;'], ';']]
ident: _main
body: [['abc;', 'de', 'f;', 'ji(k,', 'l);', 'if(a==b)', ['pass;'], ';']]
PaulMcG
  • 62,419
  • 16
  • 94
  • 130