2

Here's a simple grammar test I thought would be easy to parse, but I get 'mismatched input' right off the bat and I can't figure out what Antlr is looking for.

The input:

  # include "something" program TEST1 { BLAH BLAH }

My grammar:

  grammar ProgHeader;

  program: header* prog EOF ;
  header: '#' ( include | define ) ;
  include: 'include' string ;
  define: 'define' string string? ;
  string: '"' QTEXT '"' ;
  prog: 'program' QTEXT '{' BLOCK '}' ;
  QTEXT: ~[\r\n\"]+ ;
  BLOCK: ~[}]+ ; // don't care, example block
  WS: [ \t\r\n] -> skip ;

The output error message:

line 1:0 mismatched input '# include "something" program TEST1 { BLAH BLAH '
expecting {'program', '#'}

This really confuses me because it says it's looking for a '#' and there's one right at the start of the input. I dumped the parse tree too. It appears to be stuck right at the top, at the 'program' rule:

(program # include "something" program TEST1 { BLAH BLAH  } )

Halp?

Here's the full program driving this test case if it matters (I don't think it should matter, the above info is enough, but here it is):

  package antlrtests;

  import antlrtests.grammars.*;
  import org.antlr.v4.runtime.*;
  import org.antlr.v4.runtime.tree.*;

  /**
   *
   * @author Brenden Towey
   */
  public class ProgHeaderTest {
     private String[] testVectors = {
        "# include \"something\" program TEST1 { BLAH BLAH } ",
     };
     public void runTests() {
        for( String test : testVectors )
           simpleTest( test );
     }
     private void simpleTest( String test ) {
        ANTLRInputStream ains = new ANTLRInputStream( test );
        ProgHeaderLexer wpl = new ProgHeaderLexer( ains );
        CommonTokenStream tokens = new CommonTokenStream( wpl );
        ProgHeaderParser wikiParser = new ProgHeaderParser( tokens );
        ParseTree parseTree = wikiParser.program();
        System.out.println( "'" + test + "': " + parseTree.toStringTree(
                wikiParser ) );
     }
  }

And the full output:

run:
line 1:0 mismatched input '# include "something" program TEST1 { BLAH BLAH ' expecting {'program', '#'}
'# include "something" program TEST1 { BLAH BLAH } ': (program # include "something" program TEST1 { BLAH BLAH  } )
BUILD SUCCESSFUL (total time: 0 seconds)
markspace
  • 10,621
  • 3
  • 25
  • 39

1 Answers1

3

The longest token that matches at the very beginning is QTEXT, which matches the text # include (the text up to but not including the first " character), but valid tokens at that point are 'program' and '#', as reported. So better avoid token definitions that match almost anything.

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
Gunther
  • 5,146
  • 1
  • 24
  • 35
  • I don't think I understand. Are you saying that *all* lexer rules are always active? That seems mighty odd, even broken. It might be true, but it would seem to limit the usefulness of Antlr. The way I read it, the rules (not tokens) active at the top level are "#" and "program". I'll leave this question open until a few more folks chime in. – markspace May 03 '13 at 19:03
  • @user2338547 In ANTLR 4, one lexer *mode* is active at a time, and the longest non-`fragment` lexer rule in that mode rule will determine which token is created. Your grammar only includes one mode (the default mode), so yes all the lexer rules will be active at once. – Sam Harwell May 03 '13 at 19:11
  • OK, I need to think on this. "Longest non-fragment lexer rule"... longest determined by what? What rule produces the longest token (text string)? – markspace May 03 '13 at 19:24
  • 1
    An helpful message from Antlr would be to indicate that the QTEXT lexer rule was matched instead of the two expected ones : header and prog . – Stephan May 21 '20 at 19:47