3

Is there a way to parse words that start with a specific character?

I've been trying the following but i couldn't get any promising results:

//This one is working it accepts AD CD and such
example1
:
  .'D'
;
//This one is not, it expects character D, then any ws character then any character
example2
:
  'D'.
;
//These two are not working either
example3
:
  'D'.*
;
//Doesn't accept input due to error: "line 1:3 missing 'D' at '<EOF>'"
example4
:
  .*'D'
;


//just in case my WS rule:
/**    WhiteSpace Characters (HIDDEN)*/
WS  :   ( ' '
        | '\t'
        )+ {$channel=HIDDEN;}
    ;

I am using ANTLR 3.4

Thanks in advance

Attila Horváth
  • 562
  • 1
  • 5
  • 16

1 Answers1

2
//This one is not, it expects character D, then any ws character then any character
example2
:
  'D'.
;

No, it does not it accept the token (not character!) 'D' followed by a space and then any character. Since example2 is a parser rule, it does not match characters, but matches tokens (there's a big difference!). And since you put spaces on a separate channel, the spaces are not matched by this rule either. At the end, the . (DOT) matches any token (again: not any character!).

More info on meta chars (like the . (DOT)) whose meaning differ inside lexer- and parser rules: Negating inside lexer- and parser rules

//These two are not working either
example3
:
  'D'.*
;
//Doesn't accept input due to error: "line 1:3 missing 'D' at '<EOF>'"
example4
:
  .*'D'
;

Unless you know exactly what you're doing, don't use .*: they gobble up too much in your case (especially when placed at the start or end of a rule).

It looks like you're trying to tokenize things inside the parser (all your example rules are parser rules). As far as I can see, these should be lexer rules instead. More on the difference between parser- and lexer rules, see: Practical difference between parser rules and lexer rules in ANTLR?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288