4

I would like to use the excellent pyparsing package to parse a python function call in its most general form. I read one post that was somewhat useful here but still not general enough.

I would like to parse the following expression:

f(arg1,arg2,arg3,...,kw1=var1,kw2=var2,kw3=var3,...)

where

  1. arg1,arg2,arg3 ... are any kind of valid python objects (integer, real, list, dict, function, variable name ...)
  2. kw1, kw2, kw3 ... are valid python keyword names
  3. var1,var2,var3 are valid python objects

I was wondering if a grammar could be defined for such a general template. I am perhaps asking too much ... Would you have any idea ?

thank you very much for your help

Eric

Community
  • 1
  • 1
Eurydice
  • 8,001
  • 4
  • 24
  • 37
  • 2
    Sure it's possible, there is a full [python grammar implementation](http://pyparsing.wikispaces.com/file/detail/pythonGrammarParser.py) in pyparsing examples (though it looks quite cryptic, but might help). – bereal Jan 23 '13 at 09:39
  • 3
    Depending on what you're doing with it - another option is to use Python's standard library to parse it: see [ast.parse](http://docs.python.org/2/library/ast.html#ast.parse) and related `NodeVisitor` and `NodeTransformer` classes. – Jon Clements Jan 23 '13 at 10:32
  • That's **not** the most general form. The most general form is `f(arg1, arg2, arg3, ..., kwarg1=val1, kwarg2=val2, ..., *args, **kwargs)`. – Bakuriu Jan 23 '13 at 11:45
  • You right Bakuriu, thanks for the comment. – Eurydice Jan 23 '13 at 14:01

1 Answers1

8

Is that all? Let's start with a simple informal BNF for this:

func_call ::= identifier '(' func_arg [',' func_arg]... ')'
func_arg ::= named_arg | arg_expr
named_arg ::= identifier '=' arg_expr
arg_expr ::= identifier | real | integer | dict_literal | list_literal | tuple_literal | func_call
identifier ::= (alpha|'_') (alpha|num|'_')*
alpha ::= some letter 'a'..'z' 'A'..'Z'
num ::= some digit '0'..'9'

Translating to pyparsing, work bottom-up:

identifier = Word(alphas+'_', alphanums+'_')

# definitions of real, integer, dict_literal, list_literal, tuple_literal go here
# see further text below

# define a placeholder for func_call - we don't have it yet, but we need it now
func_call = Forward()

string = pp.quotedString | pp.unicodeString

arg_expr = identifier | real | integer | string | dict_literal | list_literal | tuple_literal | func_call

named_arg = identifier + '=' + arg_expr

# to define func_arg, must first see if it is a named_arg
# why do you think this is?
func_arg = named_arg | arg_expr

# now define func_call using '<<' instead of '=', to "inject" the definition 
# into the previously declared Forward
#
# Group each arg to keep its set of tokens separate, otherwise you just get one
# continuous list of parsed strings, which is almost as worthless the original
# string
func_call << identifier + '(' + delimitedList(Group(func_arg)) + ')'

Those arg_expr elements could take a while to work through, but fortunately, you can get them off the pyparsing wiki's Examples page: http://pyparsing.wikispaces.com/file/view/parsePythonValue.py

from parsePythonValue import (integer, real, dictStr as dict_literal, 
                              listStr as list_literal, tupleStr as tuple_literal)

You still might get args passed using *list_of_args or **dict_of_named_args notation. Expand arg_expr to support these:

deref_list = '*' + (identifier | list_literal | tuple_literal)
deref_dict = '**' + (identifier | dict_literal)

arg_expr = identifier | real | integer | dict_literal | list_literal | tuple_literal | func_call | deref_list | deref_dict

Write yourself some test cases now - start simple and work your way up to complicated:

sin(30)
sin(a)
hypot(a,b)
len([1,2,3])
max(*list_of_vals)

Additional argument types that will need to be added to arg_expr (left as further exercise for the OP):

  • indexed arguments : dictval['a'] divmod(10,3)[0] range(10)[::2]

  • object attribute references : a.b.c

  • arithmetic expressions : sin(30), sin(a+2*b)

  • comparison expressions : sin(a+2*b) > 0.5 10 < a < 20

  • boolean expressions : a or b and not (d or c and b)

  • lambda expression : lambda x : sin(x+math.pi/2)

  • list comprehension

  • generator expression

Vincent Wen
  • 1,822
  • 1
  • 15
  • 12
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • I tried some of the examples you gave and the defined grammar failed to parse the `len([1,2,3])` string. It gives a `TypeError: object of type 'int' has no len()` because the `[1,2,3]` list is concatenated to 123 integer. I tried to figure out why. I thought that it may be due to the `Optional(Suppress(","))` used for defining listStr, tupleStr and dictStr but removing them still produces the same error. – Eurydice Feb 03 '13 at 08:43
  • I don't understand why `[1,2,3]` concatenates to an integer - listStr should parse and convert that to the 3-element list `[1,2,3]`. – PaulMcG Feb 03 '13 at 19:34
  • That's what puzzles me too, Paul. If I do:`type(eval(func_call.transformString('[1,2,3]')))` I get a `` but when I do `func_call.transformString('len([1,2,3])')` I get **len(123)** and not **len([1,2,3])** as expected. I will try to dig this further. If in the meantime, you could find the problem, you're wellcome !!! – Eurydice Feb 04 '13 at 08:59
  • Pyparsing is no longer hosted on wikispaces.com. Go to https://github.com/pyparsing/pyparsing – PaulMcG Aug 27 '18 at 13:13