1

For example, I define several lexer rules in my Grammar:

INT: 'int';
FLOAT: 'float';
...

DIGIT : [0-9];
NUMERIC : (DIGIT+ | DIGIT+ '.' DIGIT+ | '.' DIGIT+ | DIGIT+ '.');
...

I need to somehow mark keywords ('int', 'float', and some other), that when I get tokens by using TokenStream I can filter them by some custom sign.

It is possible?

Right now I see only one way - unite necessary lexers into some rule.

Update

I try to apply the first option of the first answer below, but get the next problems: I get an error: 'TOKENNAME is not a recognized token name'

For this case was an issue. I apply recommendations from here:

use

options { tokenVocab = MyLexer; }

instead of

import MyLexer;

and get the error: 'error(114): MyParser.g4:3:23: cannot find tokens file .\MyLexer.tokens'

Here says, how I understand, that it's may happen when ANTLR source files (MyParser.g4, MyLexer.g4) is placed in the same directory where placed generated package. But I set a property of output file to another directory. Maybe I get some miss understanding...

Here is a small example.

Andrei
  • 11
  • 2
  • Keywords have their own lexer id, which should be enough to identify them reliably. Why do you need another way? – Mike Lischke Jun 21 '19 at 07:01
  • I want to split received terminals by groups to apply different syntax backlight in VS Language Extension. And will be nice to define some group key in description lexer in grammar, if it is possible, of course. – Andrei Jun 21 '19 at 07:28

1 Answers1

0

Depending on what else you are using the lexer for there are 2 avenues you can explore.

  1. The type() lexer command to remap tokens.

    Taking the example from the docs there:

    lexer grammar SetType;
    tokens { STRING }
    DOUBLE : '"' .*? '"'   -> type(STRING) ;
    SINGLE : '\'' .*? '\'' -> type(STRING) ;
    WS     : [ \r\t\n]+    -> skip ;
    

    This would allow multiple rules for the single type STRING which is the token type you would receive in your stream.

  2. The channel() command which you can use to mark and filter the tokens once you have the token stream. This has the benefit of retaining the original lexer stream if you still need to parse afterwards.

    Again, stealing the example from the antlr docs:

    BLOCK_COMMENT
        : '/*' .*? '*/' -> channel(HIDDEN)
        ;
    LINE_COMMENT
        : '//' ~[\r\n]* -> channel(HIDDEN)
        ;
    
SpencerPark
  • 3,298
  • 1
  • 15
  • 27
  • Thanks, but I get several problems with the first option. Look in updates of question. – Andrei Jun 24 '19 at 10:30
  • Hi, the edits to the question fundamentally change the question and therefore deserve their own! I suspect https://stackoverflow.com/questions/24299214/using-antlr-parser-and-lexer-separatly might answer it but if not, in the new question, the links to issues are good but include the relevant source code directly (don't expect readers to download your source from one drive). Thanks! – SpencerPark Jun 26 '19 at 14:31