What I usually do is first dump the tokens to see if the actual tokens the parser expects are created.
You can do that with a small test class like this (easily ported to Python):
public class Main {
static void test(String input) {
metrinkLexer lexer = new metrinkLexer(new ANTLRInputStream(input));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
tokenStream.fill();
System.out.printf("input: `%s`\n", input);
for (Token token : tokenStream.getTokens()) {
if (token.getType() != TLexer.EOF) {
System.out.printf(" %-20s %s\n", metrinkLexer.VOCABULARY.getSymbolicName(token.getType()), token.getText());
}
}
System.out.println();
}
public static void main(String[] args) throws Exception {
test("-1d metric('blah', 'blah', 'blah')");
}
}
If you run the code above, the following will get printed to your console:
input: `-1d metric('blah', 'blah', 'blah')`
MINUS -
INTEGER_LITERAL 1
IDENTIFIER d
METRIC metric
LPAREN (
STRING_LITERAL 'blah'
COMMA ,
STRING_LITERAL 'blah'
COMMA ,
STRING_LITERAL 'blah'
RPAREN )
As you can see, the d
is being tokenized as a IDENTIFIER
instead of an TIME_INDICATOR
. This is because the IDENTIFIER
rule is defined before your TIME_INDICATOR
rule. The lexer does not "listen" to what the parser might need, it simply matches the most characters as possible, and if two or more rules match the same amount of characters, the rule defined first "wins".
So, d
can either be tokenized as TIME_INDICATOR
or an IDENTIFIER
. If this is dependent on context, I suggest you tokenize it as a IDENTIFIER
(and remove TIME_INDICATOR
) and create a parser rule like this:
relative_time_literal:
MINUS? INTEGER_LITERAL time_indicator;
time_indicator:
{_input.LT(1)getText().matches("[shmd]")}? IDENTIFIER;
The { ... }?
is called a predicate: Semantic predicates in ANTLR4?
Also, FALSE
and TRUE
will need to be placed before the IDENTIFIER
rule.