2

Alright, I'm trying to write a simple parser given the following:

d = {
    'a': [1,2,3,4],
    'b': [2,3,4,5],
    'c': [2,4,6,7]
}

And the following two function:

def _and(l1, l2):
    return [i for i in l1 if i in l2]

def _or(l1, l2):
    return list(set(l1+l2))

I am trying to take in a string (e.g. "a||(b&c)") and parse it into the following:

_or(d['a'],_and(d['b'],d['c']))

I've never written a parser before, so I'm a little lost. The parser should support OR, AND and parentheses. Could someone point me in the right direction? A similar example in Python would be great if anyone knows of any.

arshajii
  • 127,459
  • 24
  • 238
  • 287
MatthewKremer
  • 1,549
  • 1
  • 14
  • 25
  • 1
    a typical parser implementation usually starts with a conversion to the reverse polish notation (http://en.wikipedia.org/wiki/Reverse_Polish_notation). The output is usually a stack of separated operands and operators, which is much easier to handle. – njzk2 Jul 29 '13 at 14:37
  • @njzk2 Only if you're writing an evaluator. Here it seems a straightforward AST would work better, since (it seems) Python is providing the evaluation and the parser would generate Python source code. – millimoose Jul 29 '13 at 23:12
  • @millimoose : not entirelly sure i see the difference ? I usually see the RPN as an array representation of a tree, which kinda looks like an AST ? (maybe my RPN is not really an RPN, then) – njzk2 Jul 30 '13 at 07:24
  • @njzk2 RPN is one way of linearizing the AST (by walking it in post-order I think) that lets you then evaluate it by a simple nonrecursive stack-based algorithm. If you already have an expression tree, creating a RPN wouldn't help here. There should be an emphasis on *abstract*, as in you don't really need to build the AST up front, it's possible to write an action-based parser that directly emits RPN. (It's probably covered in most compiler courses.) But if you're already writing the parser, you might as well create an AST which can be converted into the OP's Python code more readily. – millimoose Jul 30 '13 at 14:26
  • @njzk2 Or, to do a "proof-by-contradiction" kind of thing. The RPN for the above expression would be `[a, b, c, AND, OR]` - can you show an algorithm that can convert that array to the OP's Python expression easily? And, more importantly, will this algorithm be simpler than one whose input is the tree `OR(a, AND(b, c))` - which a high-level parser toolkit (like ANTLRv4 or pyparsing) will give you just as easily if not more so than RPN? – millimoose Jul 30 '13 at 14:29

1 Answers1

2

I'll provide a general overview of how you should consider approaching this problem.

You'll need to split your input string into tokens and convert a list of these tokens into a syntax tree. In your case, you should have something along the lines of:

  • a
  • ||
  • (b&c)
    • b
    • &
    • c

You will likely need to apply the same parsing technique to the component within the parenthesis, b&c, to split that into tokens as well (as shown). Hence, this parsing procedure may very well be recursive, so as to handle arbitrarily nested parenthesis. There are existing tools that may be able to help with this, such as ANTLR.

From here, you will want to create a syntax tree based on the operator precedence. In this case, your expression can be illustrated with the following tree:

   OR
  /  \
 a   AND
     / \
    b   c

You can then recursively traverse this tree and perform each node's "operation" based on its children nodes. Evidently, this is all easier said than implemented. One approach that I have taken is to create a Node class, instances of which are used to form the tree. Each Node can have an evaluate method that returns its result. For the leaves a, b and c, the results are simply d['a'], d['b'] and d['c']. For OR and AND, the results are based on the functions _and and _or that you have defined.

arshajii
  • 127,459
  • 24
  • 238
  • 287