Determining whether string complies with ANTLR4 grammar

Question

How can I test a string against my grammar just to see whether it's valid (i.e. that no errors were found, and no error recovery was necessary)?

I've tried this as well as a custom error strategy but I'm still getting messages like

line 1:2 token recognition error at: 'x'

on the console. So I need either a way to ensure all errors result in exceptions, or a way to validate input that doesn't rely on exceptions.

Anyone interested in getting good error messages may also want to look at [this](http://stackoverflow.com/a/19406255/446591) — Brad Mace, Oct 09 '14 at 14:17

score 6 · Accepted Answer · answered Oct 08 '14 at 16:26

Edit: What you are seeing is a lexer error, not a parser error. You need to update your lexer to ensure the lexer is incapable of failing to match an input character by adding the following as the last rule of your lexer. This will pass the erroneous character on to the parser for handling (reporting, recovery, etc.).

ERR_CHAR : . ;

In addition to this, you need to perform the general steps below which apply to configuring the parser for simple string recognition.

You need to do two things for this to work properly:

First, disable the default error reporting mechanism(s).

parser.removeErrorListeners();

Second, disable the default error recovery mechanism(s).

parser.setErrorStrategy(new BailErrorStrategy());

You'll get a ParseCancellationException, and no other reporting, if your string does not match.

If you aren't using the output from the parse operation, you may also wish to improve the efficiency of the recognition process by disabling parse tree construction.

parser.setBuildParseTree(false);

Some good tips (thanks!), but I'm not quite sold on `BailErrorStrategy` due to the complete lack of any useful message. It's really stunning to me that ANTLR doesn't include messages with its exceptions--am I missing something? — Brad Mace, Oct 08 '14 at 19:40
@BradMace I believe that is a separate question, since the one here says "no errors...needed". As the question is phrased, `BailErrorStrategy` will do everything you need, and be efficient on top of it. — Sam Harwell, Oct 08 '14 at 20:50

score 1 · Answer 2 · answered Oct 08 '14 at 19:32

1

A quick and dirty solution...

Parser p = new MyParser(myTokenStream);
p.rootRule();

if (p.getNumberOfSyntaxErrors() > 0) {
    throw new RuntimeException("Syntax error!");
}

This won't help you if there are lexical errors which don't cause the parser to get confused (e.g. extraneous input) because the number of syntax errors will still be zero.

This is a good solution if you don't want to mess around with the ErrorListeners and you don't care about certain lexer errors which the parser can get around.

answered Oct 08 '14 at 19:32

hendryau

426
3
14

This doesn't address any of the concerns in the question. – Sam Harwell Oct 08 '14 at 19:53
@SamHarwell The question was how to validate input... this solution is certainly not an end-all be-all solution, But checking the value of Parser#getNumberOfSyntaxErrors() will help validate/invalidate the input. – hendryau Oct 08 '14 at 20:04
1

@SamHarwell I was also thinking that this, along with overriding the Lexer's `recover` methods to rethrow exceptions could (crudely) do the job. Are there other errors that wouldn't be detected this way? – Brad Mace Oct 08 '14 at 20:22
1

Sorry for not being more detailed originally. The problem with this solution is it doesn't disable error reporting or error recovery. Even if you account for those, for the task at hand it's simply *much*, *much* slower than necessary. – Sam Harwell Oct 08 '14 at 20:49
1

The question wasn't "how to disable error reporting" and it wasn't "how to disable error recovery". It was "how to validate input", and my solution provides a way to validate some input. I concede it is not the fastest or cleanest solution, but it requires less code and is less error prone for a non-experienced antlr user. – hendryau Oct 08 '14 at 21:41
@SamHarwell Ah, a major performance hit is a legitimate downside. Looks like I was somewhat unclear regarding errors, which I've clarified now, but you may be right that that's a separate question. – Brad Mace Oct 09 '14 at 14:15

Determining whether string complies with ANTLR4 grammar

2 Answers2