Antlr4: Mismatched input

Question

Here's a simple grammar test I thought would be easy to parse, but I get 'mismatched input' right off the bat and I can't figure out what Antlr is looking for.

The input:

  # include "something" program TEST1 { BLAH BLAH }

My grammar:

  grammar ProgHeader;

  program: header* prog EOF ;
  header: '#' ( include | define ) ;
  include: 'include' string ;
  define: 'define' string string? ;
  string: '"' QTEXT '"' ;
  prog: 'program' QTEXT '{' BLOCK '}' ;
  QTEXT: ~[\r\n\"]+ ;
  BLOCK: ~[}]+ ; // don't care, example block
  WS: [ \t\r\n] -> skip ;

The output error message:

line 1:0 mismatched input '# include "something" program TEST1 { BLAH BLAH '
expecting {'program', '#'}

This really confuses me because it says it's looking for a '#' and there's one right at the start of the input. I dumped the parse tree too. It appears to be stuck right at the top, at the 'program' rule:

(program # include "something" program TEST1 { BLAH BLAH  } )

Halp?

Here's the full program driving this test case if it matters (I don't think it should matter, the above info is enough, but here it is):

  package antlrtests;

  import antlrtests.grammars.*;
  import org.antlr.v4.runtime.*;
  import org.antlr.v4.runtime.tree.*;

  /**
   *
   * @author Brenden Towey
   */
  public class ProgHeaderTest {
     private String[] testVectors = {
        "# include \"something\" program TEST1 { BLAH BLAH } ",
     };
     public void runTests() {
        for( String test : testVectors )
           simpleTest( test );
     }
     private void simpleTest( String test ) {
        ANTLRInputStream ains = new ANTLRInputStream( test );
        ProgHeaderLexer wpl = new ProgHeaderLexer( ains );
        CommonTokenStream tokens = new CommonTokenStream( wpl );
        ProgHeaderParser wikiParser = new ProgHeaderParser( tokens );
        ParseTree parseTree = wikiParser.program();
        System.out.println( "'" + test + "': " + parseTree.toStringTree(
                wikiParser ) );
     }
  }

And the full output:

run:
line 1:0 mismatched input '# include "something" program TEST1 { BLAH BLAH ' expecting {'program', '#'}
'# include "something" program TEST1 { BLAH BLAH } ': (program # include "something" program TEST1 { BLAH BLAH  } )
BUILD SUCCESSFUL (total time: 0 seconds)

score 3 · Accepted Answer · edited May 03 '13 at 19:10

3

The longest token that matches at the very beginning is QTEXT, which matches the text # include (the text up to but not including the first " character), but valid tokens at that point are 'program' and '#', as reported. So better avoid token definitions that match almost anything.

edited May 03 '13 at 19:10

Sam Harwell

97,721
20
209
280

answered May 03 '13 at 18:23

Gunther

5,146
1
24
35

I don't think I understand. Are you saying that *all* lexer rules are always active? That seems mighty odd, even broken. It might be true, but it would seem to limit the usefulness of Antlr. The way I read it, the rules (not tokens) active at the top level are "#" and "program". I'll leave this question open until a few more folks chime in. – markspace May 03 '13 at 19:03
@user2338547 In ANTLR 4, one lexer *mode* is active at a time, and the longest non-`fragment` lexer rule in that mode rule will determine which token is created. Your grammar only includes one mode (the default mode), so yes all the lexer rules will be active at once. – Sam Harwell May 03 '13 at 19:11
OK, I need to think on this. "Longest non-fragment lexer rule"... longest determined by what? What rule produces the longest token (text string)? – markspace May 03 '13 at 19:24
1

An helpful message from Antlr would be to indicate that the QTEXT lexer rule was matched instead of the two expected ones : header and prog . – Stephan May 21 '20 at 19:47

Antlr4: Mismatched input

1 Answers1

Linked