How do you insert extra validation logic into an ANTLR4 parser rule?

Question

I have an ANTLR4 grammar that has a parser rule line as below:

| expression operator='=' expression    #AssignmentExpression

This rule is part of a large compound rule for defining an expression. However, the reality is that only a subset of actual expressions types are valid for the left hand side of an assignment, but due to left recursive issues, I cannot scope the parser rule down to those specific expression subsets. What I wish to do, is insert custom code into the generated parser when matching the rule, that then evaluates the actual most inner type within the expression on the left hand, to insure it is of one of the valid types. If it is not, ideally I would generate a custom parser error to be registered, something like Invalid expression on the left hand assignment. Root expression must be of type identifier or property reference.. I'm sure there is a way to do this with ANTLR4, but I have not been able to find the proper method.

I am creating a lexer/parser for a Language called Moo that is used in an object based mud environment. I noticed that the server parser (written using yacc/bison) takes a similar approach of allowing expression '=' expression, but then interrogates the left hand expression to insure it is of the correct subtype, otherwise generates a parser error. If however, this is not the correct way to do such a thing within ANTLR, I would love to be corrected and educated about the correct way in which to achieve this.

For anyone curious about further details, the language only allows a property reference or identifier on the left hand side, however those could be indexed, so a[1] = 1 is still valid. This is why I need to not only check the expression type of the left hand expression, but also determine its root expression type (in this case the identifier 'a').

Do you allow function calls to return objects which can be indexed? — rici, Aug 18 '22 at 00:33
See https://github.com/antlr/antlr4/blob/master/doc/listeners.md#listening-during-the-parse . You can instead execute the listener/visitor after the parse, which is what I think Mike is alluding to below. Note, "antlr4cs" (unmaintained, forked private copy) != "antlr4" (standard/official version that is maintained). — kaby76, Aug 18 '22 at 13:26

Mike Cargal · Accepted Answer · 2022-08-18T21:48:16.787

1

This seems a good example of a situation where many of us suggest NOT trying to shoehorn everything about your language into the grammar.

You have a grammar that correctly builds a parse tree of this only valid interpretation of the input stream. The parser has done its job for you.

Now, using a listener (maybe a visitor if you find it a better fit), you can identify this situation and report out exactly the error message you want.

You create your listener by inheriting from <gramarName>BaseListener and overriding the applicable enter* or exit* methods for whatever context node you're interested in (in your case enterAssignmentExpression()). You then use a tree walker to walk the parse tree you got back from the parser calling your listener as it goes.

 ParseTreeWalker.DEFAULT.walk(<yourListener>, <yourParseTree>);

In this case, maybe label you expressions:

| lhs=expression operator='=' hrs=expression    #AssignmentExpression

The in the enterAssignmentExpression() method override, examine the lhs expression, and if it’s the wrong type of expression, add your error message to your collection of errors. The grammar remains straightforward and you can be as specific as you want with your error message.

edited Aug 18 '22 at 21:48

answered Aug 18 '22 at 11:42

Mike Cargal

6,610
3
21
27

Ah I understand, so I would simply add a custom Listener implementation and add it to the parser right before parsing. Thank you, that makes perfect sense! I will implement your suggestion with the named operands too. – WiredWiz Aug 18 '22 at 21:07
no, It's not an errorListener (that's what you'd attach to the parser before parsing. – Mike Cargal Aug 18 '22 at 21:42
1

I've amended the answer to give a brief mention of how to call the listener. – Mike Cargal Aug 18 '22 at 21:48
Yes, that was what I meant, sorry if it wasn't clear, but I very much appreciate the expanded explanation. I assume there is no reason I can't also hook it up before parsing via the parser.AddParseListener() method, so that it gets called as parsing happens, rather than walking after? – WiredWiz Aug 19 '22 at 22:28
1

I would recommend against it. Your listener will have to deal with a ParseTree that is still being built (there are other warnings here: https://www.antlr.org/api/Java/org/antlr/v4/runtime/Parser.html#addParseListener(org.antlr.v4.runtime.tree.ParseTreeListener) ). It works well for the TraceListener, but for semantic validation, I find it’s much easier to deal with the finished tree after the parse completes. – Mike Cargal Aug 20 '22 at 11:10

How do you insert extra validation logic into an ANTLR4 parser rule?

1 Answers1