2

Hopefully this is just the right amount of information to help me solve this problem.

Given the following ANTLR3 syntax

grammar mygrammar;

program : statement* | function*;

function : ID '(' args ')' '->' statement+ (','statement+) '.' ;    

args    : arg (',' arg)*;       

arg     : ID ('->' expression)?;

statement : assignment
          | number
          | string
          ;

assignment : ID '->' expression;    

string  : UNICODE_STRING;

number : HEX_NUMBER | INTEGER ( '.' INTEGER )?;


// ================================================================

HEX_NUMBER : '0x' HEX_DIGIT+;

INTEGER : DIGIT+;

fragment
DIGIT   :   ('0'..'9');

Here is the line that is causing problems in the parser.

my_function(x, y, z -> 42) -> 10001.

ANTLRWorks highlights the last . after the 10001 in red as being a problem with the following error.

How can I make this stop throwing org.antlr.runtime.EarlyExitException?

I am sure this is because of some ambiguity between my number parser rule and trying to use the . as a EOL delimiter.

2 Answers2

3

There is another ambiguity that also needs fixing. Change:

program : statement* | function*;

into:

program  : (statement | function)*;

(although the 2 are not equivalent, I'm guessing you want the latter)

And in your function rule, you now defined there to be at least 2 statements:

function : ID '(' args ')' '->' statement (','statement)+ '.' ; 

while I'm guessing you really want at least one:

function : ID '(' args ')' '->' statement (','statement)* '.' ; 

Now, your real problem: since you're constructing floats in a parser rule, from the end of your input, 10001., the parser tries to construct a number of it, while you want it to match an INTEGER and then a ., as you yourself already said in your OP.

To fix this, you need to give the parser a bit of extra look-ahead to "see" beyond this ambiguity. Do that by adding the predicate (INTEGER '.' INTEGER)=> before actually matching said input:

number
  :  HEX_NUMBER 
  |  (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER 
  | INTEGER
  ;

Now your input will generate the following parse tree:

enter image description here

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • is there a better way to detect the float numbers other than the way I am doing it? –  Nov 11 '11 at 16:24
  • @JarrodRoberson, there are various ways, but all have their draw-backs. The easiest would be to introduce a `FLOAT : DIGIT+ '.' DIGIT+;` in your lexer and change the fact that statements can end with a `'.'`. If you want to keep the `'.'` as a EOL, then the way you're doing it now (including my suggestions) is probably the easiest way to solve it. – Bart Kiers Nov 11 '11 at 16:40
1

Perhaps unrelated, but I'm curious none-the-less:

function : ID '(' args ')' '->' statement+ (','statement+) '.' ;

Should this instead be:

function : ID '(' args ')' '->' statement (',' statement)* '.' ;

I think the first one would require a single comma in a function definition but the second one would require a comma as a statement separator.

Also, does the rule for args allow z -> 42 correctly?

sarnold
  • 102,305
  • 22
  • 181
  • 238
  • @BartKiers (and sarnold) thanks for the input, I am really trying to grok ANTLR3, I have both the Terrance Parr books but the theory on the lexer and parser rules are still pretty opaque, I really appreciate the help! –  Nov 11 '11 at 15:22
  • @JarrodRoberson, you're welcome. Also see this previous Q&A about the difference between lexer- and parser rules: http://stackoverflow.com/questions/4297770/practical-difference-between-parser-rules-and-lexer-rules-in-antlr – Bart Kiers Nov 11 '11 at 16:07