13

I'm trying to use pyparsing to parse function calls in the form:

f(x, y)

That's easy. But since it's a recursive-descent parser, it should also be easy to parse:

f(g(x), y)

That's what I can't get. Here's a boiled-down example:

from pyparsing import Forward, Word, alphas, alphanums, nums, ZeroOrMore, Literal

lparen = Literal("(")
rparen = Literal(")")

identifier = Word(alphas, alphanums + "_")
integer  = Word( nums )

functor = identifier

# allow expression to be used recursively
expression = Forward()

arg = identifier | integer | expression
args = arg + ZeroOrMore("," + arg)

expression << functor + lparen + args + rparen

print expression.parseString("f(x, y)")
print expression.parseString("f(g(x), y)")

And here's the output:

['f', '(', 'x', ',', 'y', ')']
Traceback (most recent call last):
  File "tmp.py", line 14, in <module>
    print expression.parseString("f(g(x), y)")
  File "/usr/local/lib/python2.6/dist-packages/pyparsing-1.5.6-py2.6.egg/pyparsing.py", line 1032, in parseString
    raise exc
pyparsing.ParseException: Expected ")" (at char 3), (line:1, col:4)

Why does my parser interpret the functor of the inner expression as a standalone identifier?

JasonFruit
  • 7,764
  • 5
  • 46
  • 61

3 Answers3

14

Nice catch on figuring out that identifier was masking expression in your definition of arg. Here are some other tips on your parser:

x + ZeroOrMore(',' + x) is a very common pattern in pyparsing parsers, so pyparsing includes a helper method delimitedList which allows you to replace that expression with delimitedList(x). Actually, delimitedList does one other thing - it suppresses the delimiting commas (or other delimiter if given using the optional delim argument), based on the notion that the delimiters are useful at parsing time, but are just clutter tokens when trying to sift through the parsed data afterwards. So you can rewrite args as args = delimitedList(arg), and you will get just the args in a list, no commas to have to "step over".

You can use the Group class to create actual structure in your parsed tokens. This will build your nesting hierarchy for you, without having to walk this list looking for '(' and ')' to tell you when you've gone down a level in the function nesting:

 arg = Group(expression) | identifier | integer
 expression << functor + Group(lparen + args + rparen)

Since your args are being Grouped for you, you can further suppress the parens, since like the delimiting commas, they do their job during parsing, but with grouping of your tokens, they are no longer necessary:

lparen = Literal("(").suppress()
rparen = Literal(")").suppress()

I assume 'h()' is a valid function call, just no args. You can allow args to be optional using Optional:

expression << functor + Group(lparen + Optional(args) + rparen)

Now you can parse "f(g(x), y, h())".

Welcome to pyparsing!

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • 4
    Thanks for all the helpful comments! This example was actually adapted from the pyparsing documentation; I use most of the techniques you describe in my actual parser. (And the language implementation is now usable in about 6 hours work --- prototyping in Python with pyparsing is amazingly quick.) – JasonFruit Apr 17 '12 at 18:09
  • what is difference between `Suppress("(")` and `Literal("(").suppress()`? – dashesy Jan 21 '15 at 21:59
  • 1
    No difference whatsoever. `expr.suppress()` returns `Suppress(expr)`, and if a string is passed as the initializer for Suppress, the string gets promoted to a Literal. – PaulMcG Jan 22 '15 at 01:31
5

The definition of arg should be arranged with the item that starts with another at the left, so it is matched preferentially:

arg = expression | identifier | integer
JasonFruit
  • 7,764
  • 5
  • 46
  • 61
1

Paul's answer helped a lot. For posterity, the same can be used to define for loops, as follows (simplified pseudo-parser here, to show the structure):

from pyparsing import (
    Forward, Group, Keyword, Literal, OneOrMore)

sep = Literal(';')
if_ = Keyword('if')
then_ = Keyword('then')
elif_ = Keyword('elif')
end_ = Keyword('end')

if_block = Forward()
do_block = Forward()

stmt = other | if_block
stmts = OneOrMore(stmt + sep)

case = Group(guard + then_ + stmts)
cases = case + OneOrMore(elif_ + case)

if_block << if_ + cases + end_
0 _
  • 10,524
  • 11
  • 77
  • 109