1

I'm using ANTLR with C# to create a simple parser for C-like structs. The runtime version is 4.7. The grammar looks like this:

structDef : STRUCT ID OPENBLOCK (fieldDef)+ CLOSEBLOCK ;
fieldDef : (namespaceQualifier)+ ID ID SEMICOLON ;
namespaceQualifier : ID DOT ;

/*
 * Lexer Rules
 */

ID : [a-zA-Z_] [a-zA-Z0-9_]* ;
STRUCT : 'struct' ;
NAMESPACE : 'namespace' ;
OPENBLOCK : '{' ;
CLOSEBLOCK : '}' ;
DOT : '.' ;
SEMICOLON : ';' ;
WHITESPACE : (' '|'\t')+ -> skip;

Now when I run the parser like this:

test = "struct Stest { type name; }"
var lexer = new OdefGrammarLexer(new AntlrInputStream(test));
var tokenStream = new CommonTokenStream(lexer);
var parser = new OdefGrammarParser(tokenStream);

var ctx = parser.structDef();
Console.Out.WriteLine(ctx.ToString());

I get an error output:

line 1:0 missing 'struct' at 'struct'
line 1:7 extraneous input 'Stest' expecting '{'
line 1:20 missing '.' at 'name'
line 1:24 mismatched input ';' expecting '.'

The first error in the output is particularly interesting, seems that parser fails to find a match where it should. I suspect problems with string locale/encoding, but I'm not sure how to tackle that for ANTLR.

Any help is much appreciated.

1 Answers1

2
  1. ID rule must be after STRUCT and NAMESPACE rules (any rules that might collide with it), since if an input can match multiple tokens, the one defined first wins
  2. ID rule should probably be (but perhaps your notation is supported?):

    ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A' .. 'Z' | '0'..'9' | '_')* ;
    
Jiri Tousek
  • 12,211
  • 5
  • 29
  • 43