Question
Is there any way to declare function arguments as non-strict (passed by-name)?
If this is not possible directly: are there any helper functions or decorators that help me achieve something similar?
Concrete example
Here is a littly toy-example to experiment with.
Suppose that I want to build a tiny parser-combinator library that can cope with the following classic grammar for arithmetic expressions with parentheses (numbers replaced by a single literal value 1
for simplicity):
num = "1"
factor = num
| "(" + expr + ")"
term = factor + "*" + term
| factor
expr = term + "+" + expr
| term
Suppose that I define a parser combinator as an object that has a method parse
that can take list of tokens, current position, and either throw a parse error, or return a result and a new position. I can nicely define a ParserCombinator
base class that provides +
(concatenation) and |
(alternative). Then I can define parser combinators that accept constant strings, and implement +
and |
:
# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass
# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
def __add__(self, next):
return AddCombinator(self, next)
def __or__(self, other):
return OrCombinator(self, other)
# Literally taken string constants
class Lit(ParserCombinator):
def __init__(self, string):
self.string = string
def parse(self, tokens, pos):
if pos < len(tokens):
t = tokens[pos]
if t == self.string:
return t, (pos + 1)
else:
raise ParseError
else:
raise UnexpectedEndOfInput
def lit(str):
return Lit(str)
# Concatenation
class AddCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
x, p1 = self.first.parse(tokens, pos)
y, p2 = self.second.parse(tokens, p1)
return (x, y), p2
# Alternative
class OrCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
try:
return self.first.parse(tokens, pos)
except:
return self.second.parse(tokens, pos)
So far, everything is fine. However, because the non-terminal symbols of the grammar are defined in a mutually recursive fashion, and I cannot eagerly unfold the tree of all possible parser combinations, I have to work with factories of parser combinators, and wrap them into something like this:
# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
def __init__(self, parserFactory):
self.parserFactory = parserFactory
def parse(self, tokens, pos):
return self.parserFactory().parse(tokens, pos)
def p(parserFactory):
return LazyParserCombinator(parserFactory)
This indeed allows me to write down the grammar in a way that is very close to the EBNF:
num = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term = p(lambda: (factor + lit("*") + term) | factor)
expr = p(lambda: (term + lit("+") + expr) | term)
And it actually works:
tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))
However, the p(lambda: ...)
in every line is a bit annoying. Is there some idiomatic way to get rid of it? It would be nice if one could somehow pass the whole RHS of a rule "by-name", without triggering the eager evaluation of the infinite mutual recursion.
What I've tried
I've checked what's available in the core language: it seems that only if
, and
and or
can "short-circuit", please correct me if I'm wrong.
I've tried looking at how other non-toy-example libraries do this.
For example, funcparserlib uses explicit forward declarations to avoid mutual recursion (look at the
forward_decl
andvalue.define
part in github README.md example code).The
parsec.py
uses some special@generate
decorators and seems to do something like monadic parsing using coroutines. That's all very nice, but my goal is to understand what options I have with regards to the basic evaluation strategies available in Python.
I've also found something like the lazy_object_proxy.Proxy
, but it didn't seem to help to instantiate such objects in more concise way.
So, is there a nicer way to pass arguments by-name and avoid the blowup of mutually recursively defined values?