Lark parsing based on tree location

Question

I am trying to write a Lark grammar and parser to write a DSL on top of numpy. However the Transformer needs to output Python code, not eval that code. So, for example, I'd like to have:

my_parser("max(mat1/mat2, 20) / lag(mat1, 5)")

and this would produce a string:

'''
v0 = mat1
v1 = mat2
v2 = v0/v1
v3 = np.max(v2[-20:, :], axis=0)
v4 = mat1
v5 = v4[-5, :]
v6 = v3/v5
'''

Where mat1 and mat2 are known numpy matrices. I am trying with:

and this gives

from __future__ import print_function
import itertools
from lark import Lark, Transformer

grammar = r"""

    ?value: list
          | max
          | mat1
          | mat2
          | lag
          | div
          | max
          | SIGNED_NUMBER
          | ESCAPED_STRING

    list : "(" [value ("," value)*] ")"
    lag: "lag" "(" value "," SIGNED_NUMBER ")"
    mat1: "mat1"
    mat2: "mat2"
    max: "max" "(" value "," SIGNED_NUMBER ")"
    div: value "/" value

    %import common.SIGNED_NUMBER
    %import common.ESCAPED_STRING
    %import common.WS
    %ignore WS

    """

class MyTransformer(Transformer):
    vcounter = itertools.count()

    def __init__(self):
        self.nplist = []

    def list(self):
        pass

    def mat1(self, items):
        thisv = self.vcounter.next()
        self.nplist.append(
            "v" + str(thisv) + " = mat1"
        )

    def mat2(self, items):
        thisv = self.vcounter.next()
        self.nplist.append(
            "v" + str(thisv) + " = mat2"
        )

    def div(self, items):
        thisv = self.vcounter.next()
        self.nplist.append(
            "v" + str(thisv) + " = v" + str(thisv - 2) + "/v" + str(thisv-1)
        )

    def lag(self, items):
        thisv = self.vcounter.next()
        self.nplist.append(
            "v" + str(thisv) + " = v" + str(thisv -1) + "[-" + items[1] + ", :]"
        )

    def max(self, items):
        thisv = self.vcounter.next()
        self.nplist.append(
            "v" + str(thisv) + " = np.max(v" + str(thisv-1) + "[-" + items[1] +":, :], axis=0)"
        )

    def transform(self, tree):
        self._transform_tree(tree)
        return self.nplist

my_parser = Lark(grammar, start='value')

text = "max(mat1/mat2, 20) / lag(mat1, 5)"
tree = my_parser.parse(text)
print(*MyTransformer().transform(tree), sep='\n')

and this gives

v0 = mat1
v1 = mat2
v2 = v0/v1
v3 = np.max(v2[-20:, :], axis=0)
v4 = mat1
v5 = v4[-5, :]
v6 = v4/v5

Which is tantalizingly close!

Thanks in advance for any guidance.

You seem to be asking a very different question now. I guess that's OK since no-one answered the original, but ... Anyway, you need every reduction rule to record the name of the temporary it created by saving the name (or number) as the semantic value of the rule. For reasons which should be evident from this example, you can't count on the temporary's name to be `thisv - 1`. — rici, Aug 31 '18 at 17:13
Thanks @rici, I was trying to make the question more clear. Won't change again. :-) — Jonathan, Aug 31 '18 at 17:17

rici · Accepted Answer · 2018-08-31T18:07:00.093

Your program is attempting to generate three-address code (TAC), which is a perfectly acceptable way to proceed. However, it is important that every rule which produces a value returns the name of that value, because the parent rules really have no way to guess what the names will be. In particular, you cannot assume that the names happen to be the last two names generated. It's often true that the second operand will have the last generated name (although not insisting on this allows for some optimisations) but the first operand practically never has the second last name. Indeed, it could only have the second last name if the computation of the second operand required no names at all.

This is clear from the error in the generated code for the outer division operator. The transformer rule you're using says that the operands to / are thisv - 2 and thisv - 1, but that leads to the output of v4/v5. v4 was created by the lag operator in order to compute v5, so it certainly is not the first operand to the /.

To fix this, you just need to return from each action the name (or number) of the value, and then you need to use this name instead of trying to guess it. So div would become:

def div(self, items):
    # Name of this value
    thisv = "v" + str(self.vcounter.next())
    # Output code to define name
    self.nplist.append(
        thisv " = " + items[0] + "/" + items[1])
    )
    # Return name so that whoever uses this value knows it
    return thisv

Whether this is actually the optimal solution is at least open for debate. In python, variables are created in function scope and their values therefore persist until the function returns. The consequence can be the build-up of a lot of garbage while doing computations like this. It might well be worth trying a stack-based approach instead. In the stack-based approach, you can count on the fact that the operands to each operation are on the top of the stack.

In this scenario, you don't necessarily even have to track the stack size (and there are no names to track). It can be useful to track the stack size though (for one thing, it lets you know how much stack each expresion needs), but the tracking looks quite different: it's not a simple counter. Generally, operands increment the stack count, unary operators leave it alone, and binary operators decrement.

One of the advantages of the TAC approach outline above is that it is no longer necessary to generate `v0`, `v1` or `v4`, since you can just use the names `mat1` and `mat2`. (As your grammar is written, that requires no action on your part at all, I believe, since that will be the name in the `items` argument.) I should emphasize that I've never actually used `lark`, so I might be missing some subtlety. But subtleties like that shouldn't exist, or should be clearly documented, so I feel it is safe to assume that things work the way I think they do :-) — rici, Aug 31 '18 at 18:13

Lark parsing based on tree location

1 Answers1