Antlr4 how to detect unrecognized token and given sentence is invalid

Question

I am trying to develop a new language with Antlr. Here is my grammar file :

grammar test;

program : vr'.' to'.' e 
        ;
e: be
 | be'.' top'.' be
 ;
be: 'fg' 
  | 'fs' 
  | 'mc' 
  ;
to: 'n' 
  | 'a' 
  | 'ev' 
  ;
vr: 'er' 
  | 'fp' 
  ;
top: 'b' 
  | 'af' 
  ;
Whitespace : [ \t\r\n]+ ->skip 
           ;

Main.java

String expression = "fp.n.fss";
//String expression = "fp.n.fs.fs";
ANTLRInputStream input = new ANTLRInputStream(expression);
testLexer lexer = new testLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
testParser parser = new testParser(tokens);
//remove listener and add listener does not work
ParseTree parseTree = parser.program();

Everything is good for valid sentences. But I want to catch unrecognized tokens and invalid sentences in order to return meaningful messages. Here are two test cases for my problem.

fp.n.fss => anltr gives this error token recognition error at: 's' but i could not handle this error. There are same example error handler class which use BaseErrorListener but in my case it does not work.
fp.n.fs.fs => this sentence is invalid for my grammar but i could not catch. How can i catch invalidations like this sentence?

score 3 · Accepted Answer · edited May 23 '17 at 11:45

Firstly welcome to SO and also to the ANTLR section! Error handling seems to be one of those topics frequently asked about, theres a really good thread here about handling errors in Java/ANTLR4.

You most likely wanted to extend the functionality of the defaultErrorStrategy to handle the particular issues and handle them in a way differently that just printing the error line 1:12 token recognition error at: 's'.

To do this you can implement your own version of the default error strategy class:

Parser parser = new testParser(tokens);
            parser.setErrorHandler(new DefaultErrorStrategy()
    {

        @Override
        public void recover(Parser recognizer, RecognitionException e) {
            for (ParserRuleContext context = recognizer.getContext(); context != null; context = context.getParent()) {
                context.exception = e;
            }

            throw new ParseCancellationException(e);
        }


        @Override
        public Token recoverInline(Parser recognizer)
            throws RecognitionException
        {
            InputMismatchException e = new InputMismatchException(recognizer);
            for (ParserRuleContext context = recognizer.getContext(); context != null; context = context.getParent()) {
                context.exception = e;
            }

            throw new ParseCancellationException(e);
        }
    });

 parser.program(); //back to first rule in your grammar

I would like to also recommend splitting your parser and lexer grammars up, if not for readability but also because many tools used to analyse the .g4 file (ANTLRWORKS 2 particularly) will complain about implicity declarations.

For your example it can be modified to the following structure:

grammar test;

program : vr DOT to DOT e 
        ;
e: be
 | be DOT top DOT be
 ;
be: FG 
  | FS
  | MC 
  ;
to: N
  | A 
  | EV
  ;
vr: ER 
  | FP 
  ;
top: B
  | AF
  ;
Whitespace : [ \t\r\n]+ ->skip 
           ;

DOT : '.'
    ;

A: 'A'|'a'
 ;

AF: 'AF'|'af'
 ;
N: 'N'|'n'
 ;
MC: 'MC'|'mc'
 ;
EV:'EV'|'ev'
 ;
FS: 'FS'|'fs'
 ;
FP: 'FP'|'fp'
 ;
FG: 'FG'|'fg'
 ;
ER: 'ER'|'er'
 ;
B: 'B'|'b'
 ;

You can also find all the methods available for the defaultErrorStrategy Class here and by adding those methods to your "new" error strategy implementation handle whatever exceptions you require.

Hope this helps and Good luck with your project!

First of all thanks for helping. But when I add your defaulterrorstratgy solutions still there is an error token recognition error at:'s' . Also reportInputMismatch and reportMissingToken methods are not working for my case. Is there any steps to get rid of this error and print meaningful error messages? — Yunus, Sep 17 '16 at 05:35
@yunsk You can also add and error strategy to the lexer! This error strategy will handle the "unrecognized token" errors. Or you can add and "Error Token" at the end of your lexer definition like so: ErrorCharacter : . ; The effect is that all not recognized tokens will be caught in the token, send to the parser and will trigger the error strategy within the parser. — Fabian Deitelhoff, Sep 17 '16 at 09:18
@FDeitelhoff test case:fp.n.fs.fs result is below enter program, LT(1)=fp consume [@0,0:12='fp',<4>,1:0] rule program consume [@1,13:13='.',<6>,1:13] rule program consume [@2,14:18='n',<5>,1:14] rule program consume [@3,19:19='.',<6>,1:19] rule program consume [@4,20:27='fs',<1>,1:20] rule program exit program, LT(1)=. Is there a problem with my grammar? Because it seems like parser accepts input but this sentence is invalid for my language — Yunus, Sep 17 '16 at 12:15
I tried with both your original grammar and the modified one i pasted with your input : fp.n.fs.fs . Both resulted in the error: line 1:8 mismatched input 'fs' expecting {'b', 'af'} . If you override the basic error strategy as shown above maybe you didnt implement the method to handle it? Also it looks like from your output that the last .fs does cause an error as its absent, only one [@4,20:27='fs',<1>,1:20] fs is encountered which means the error is occuring and its not being recognized so... — D3181, Sep 17 '16 at 12:26

Antlr4 how to detect unrecognized token and given sentence is invalid

1 Answers1