1

I am trying to write a grammar to parse SQL where clause expression, And facing problem with Lexical rule to identify unique identifier. My grammar is like-

grammar Sample;
UID: '^[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}$';
literal_value : 
           UID
          ;

And my code to parse is-

    public void compile() {
    String expression = "4B66049D-6E1A-4CE6-8FBF-B31CD8B9E6AF"
    ANTLRInputStream input = new ANTLRInputStream(expression);
    SampleLexer lexer = new SampleLexer(input);
    final CommonTokenStream tokens = new CommonTokenStream(lexer);
    SampleParser parser = new SampleParser(tokens);
    SampleParser.Literal_valueContext context = parser.literal_value();
    System.out.println(context.toStringTree());
}

But I am getting error - Exception parsing expression: 'token recognition error at: '4'' on line 1, position 0

2 Answers2

0

You have fed ANTLR a regular expression. But ANTLR is not a regex engine. You need to follow its grammar, some of which is described here: https://github.com/antlr/antlr4/blob/master/doc/grammars.md

For starters, you do not want ^ and $ at the start and end. Those are regex things, not ANTLR things.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
0

The anchors ^ and $ are not valid in ANTLR. Also, the {...} is not supported by ANTLR.

What you want to do is this:

grammar Sample;

literal_value
 : UID EOF
 ;

UID
 : BLOCK BLOCK '-' BLOCK '-' BLOCK '-' BLOCK '-' BLOCK BLOCK BLOCK
 ;

fragment BLOCK
 : [A-Za-z0-9] [A-Za-z0-9] [A-Za-z0-9] [A-Za-z0-9]
 ;

The EOF is a built-in token type that, not surprisingly, denotes the end of the file (the $ anchor). And the fragment keyword indicates that such a rule will never be used to create a real token, it is only used by other rules. Also see: What does "fragment" mean in ANTLR?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288