1

I'm new to ANTLR, and trying following grammar in ANTLRWorks1.4.3.

command
:   'go' SPACE+ 'to' SPACE+ destination
;

destination
:   (UPPER | LOWER) (UPPER | LOWER | DIGIT)*
;

SPACE
:   ' '
;

UPPER
:   'A'..'Z'
;

LOWER
:   'a'..'z'
;

DIGIT
:   '0'..'9'
;

This seems to work OK, except when the 'destination' contains first two chars of keywords 'go' and 'to'. For instance, if I give following command:

go to Glasgo

the node-tree is displayed as follows:

enter image description here

I was expecting it to match fill word as destination.

I even tried changing the keyword, for example 'travel' instead of 'go'. In that case, if there is 'tr' in the destination, ANTLR complains.

Any idea why this happens? and how to fix this?

Thanks in advance.

Atul Acharya
  • 497
  • 2
  • 8
  • 21

1 Answers1

1

ANTLR lexer and parser are strictly separated. Your input is first tokenized, after which the parser rules operate on said tokens.

In you case, the input go to Glasgo is tokenized into the following X tokens:

  1. 'go'
  2. ' ' (SPACE)
  3. 'to'
  4. 'G' (UPPER)
  5. 'l' (LOWER)
  6. 'a' (LOWER)
  7. 's' (LOWER)
  8. 'go'

which leaves a "dangling" 'go' keyword. This is simply how ANTLR's lexer works: you cannot change this.

A possible solution in your case would be to make destination a lexer rule instead of a parser rule:

command
:   'go' 'to' DESTINATION
;

DESTINATION
:   (UPPER | LOWER) (UPPER | LOWER | DIGIT)*
;

SPACE
:   ' ' {skip();}
;

fragment UPPER
:   'A'..'Z'
;

fragment LOWER
:   'a'..'z'
;

fragment DIGIT
:   '0'..'9'
;

resulting in:

enter image description here


If you're not entirely sure what the difference between the two is, see: Practical difference between parser rules and lexer rules in ANTLR?

More about fragments: What does "fragment" mean in ANTLR?


PS. Glasgow?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • thanks much. It's much clearer now. **If a rule is made up of only tokens (produced by lexer), make that a lexer rule.** Is that a correct statement? – Atul Acharya Aug 10 '12 at 17:32
  • @Atul, no, not necessarily (there wouldn't be any parser rules if that were the case...). For example, your `command` is also made up from tokens only, but that should stay a parser rule. Think of lexer rules as the atoms of your language. A "destination" is just a single name, and should therefor be a lexer rule. A "command" however, consists of multiple other (lexer) rules and should be a parser rule. – Bart Kiers Aug 10 '12 at 17:36