ANTLR4: Getting start and end index for each rule: $stop behaves strange

Question

I need to get the start and end index of each rule. I.e., the start index is the character position of the first character of the first token belonging to the rule and the end index is the last character position of the last token belonging to the rule. With these numbers I can crop the result of a rule out of the input file precisely.

The straight-forward way of doing this should be using the $start and $stop tokens, i.e., $start.getStartIndex() and $stop.getStopIndex(). However, I have encountered that the $stop token is often null even when used in the @after action.

According to the definitive Antlr4 reference the $stop token is defined as: "The last nonhidden channel token to be matched by the rule. When referring to the current rule, this attribute is available only to the after and finally actions." This sounds as if such token should exist (at least for any rule that matches at least one token). Thus, it is quite strange why this token is null in many cases (even for rules that have a simple token - not a subrule - as their last token. How can a stop token be null in this case?

Right now, I am using a workaround by just asking the input about its current token, moving one token back and using this token as stop token. However, this seems hacky:

@after {
int start = $start.getStartIndex();
int stop =  _input.get(_input.index()-1).getStopIndex();
// do something with start and stop
}

The cleaner solution (if stop was not null) should look like this:

@after {
int start = $start.getStartIndex();
int stop =  $stop.getStopIndex();
}

score 3 · Accepted Answer · answered May 22 '14 at 21:55

3

The stop token is set in the finally block in the generated code, after any user-defined @finally{} action is executed. The @after{} code is executed in the try block, which also occurs before the stop token is set.

The stop property only works for qualified references. For example, you could do the following:

foo : bar {assert $bar.stop != null};

Also, note that ANTLR 4 is designed to encourage the relocation of action code from embedded actions to listener and/or visitor interfaces that operate on the parse tree after parsing is complete. When used in this manner, the stop tokens will be set for all contexts in the tree. In nearly all cases, the use of a @after or @finally block is a code smell in ANTLR 4 that you should avoid.

answered May 22 '14 at 21:55

Sam Harwell

97,721
20
209
280

2

But then, the definitive reference is wrong, as it states that the `$stop` token *is* available in the `$after` and `$finally` action. But thanks for clearing this up! – gexicide May 23 '14 at 07:45
I have found this issue very frustrating as well. When I am trying to get the text associated with a rule, as per [this question](https://stackoverflow.com/questions/16343288/how-do-i-get-the-original-text-that-an-antlr4-rule-matched) for instance, having stop be null is a severe problem, often making it impossible to tell what the text matched by a rule is. This seems like a pretty bad bug. I think antlr should guarantee that stop has been set before invoking any user action. Has an antlr bug been filed for this issue? – Some Guy Jan 25 '19 at 14:01

ANTLR4: Getting start and end index for each rule: $stop behaves strange

1 Answers1