1

I am writing an ANTLR grammar for an esoteric programming language. I am trying to add custom error messages that would be outputted when certain forms of syntax are written.

Here is an MCVE:

grammar Foo;

OPEN_BRACE: '{';
CLOSE_BRACE: '}';
SEMI_COLON: ';';
PERIOD: '.';
WS: ' ' -> skip;
NOOP: 'noop';

statements
    : NOOP
    | NOOP SEMI_COLON statements
    // other kind of statements
    ;

program
    : OPEN_BRACE
      statements
      CLOSE_BRACE
      EOF

Suppose I want to output a custom error message if program ends with a SEMICOLON or when there are no statements in the braces. Following this answer, I added some error alternatives:

program
    : OPEN_BRACE
      statements
      CLOSE_BRACE
      EOF
    | invalidPrograms
    ;

invalidPrograms
    : OPEN_BRACE
      statements
      CLOSE_BRACE
      SEMI_COLON
      EOF { notifyErrorListeners($SEMI_COLON, "program must not end with semicolon!", null); }
    | OPEN_BRACE
      CLOSE_BRACE
      EOF { notifyErrorListeners($OPEN_BRACE, "program must have at least one statement!", null); }
    ;

I then use an ErrorListener to collect all the errors the parser and lexer generates.

This works, but it also makes the other, default error messages generated by ANTLR all turn into a single "no viable alternative" message. For example, if I try to parse the syntactically invalid:

{noop

Before I added the error alternatives, it would produce a very useful error message:

line 1:5 missing '}' at '<EOF>'

Or if I write

{noop.}

The error messaging says:

line 1:5 extraneous input '.' expecting '}'

which is rather descriptive. However, if I add in the error alternatives, the error messages become:

line 1:5 no viable alternative at input...

which is undesirable. How can I keep the good error messages, while still using error alternatives?

Sweeper
  • 213,210
  • 22
  • 193
  • 313

1 Answers1

2

By adding these rules, you're, inadvertently, undermining some of the power of ANTLRs error reporting and error recovery (by adding in rules that are "valid" to the parser, specifically, by making <EOF> valid).

Generally, both of the examples you're trying to add would usually be considered semantic errors. It's better to handle these in your own code by evaluating your parser tree using either a listener or visitor.

There can be valid uses for introducing rules to match specific erroneous syntax to facilitate generation of more user-friendly error messages, but you have to be careful to avoid undermining the parser error recovery like this.

In general, you want a parser that is relatively "accepting" but unambiguous in it's proper interpretation of your input. From that parse tree you can then look for semantic errors like those you've identified. (It's a bit of a common temptation to put "as much as possible" into the grammar, but this can backfire (as you've seen))

Mike Cargal
  • 6,610
  • 3
  • 21
  • 27