In Mathematica a comment starts with (*
and ends with *)
and comments can be nested. My current approach of scanning a comment with JFlex contains the following code
%xstate IN_COMMENT
"(*" { yypushstate(IN_COMMENT); return MathematicaElementTypes.COMMENT;}
<IN_COMMENT> {
"(*" {yypushstate(IN_COMMENT); return MathematicaElementTypes.COMMENT;}
[^\*\)\(]* {return MathematicaElementTypes.COMMENT;}
"*)" {yypopstate(); return MathematicaElementTypes.COMMENT;}
[\*\)\(] {return MathematicaElementTypes.COMMENT;}
. {return MathematicaElementTypes.BAD_CHARACTER;}
}
where the methods yypushstate
and yypopstate
are defined as
private final LinkedList<Integer> states = new LinkedList();
private void yypushstate(int state) {
states.addFirst(yystate());
yybegin(state);
}
private void yypopstate() {
final int state = states.removeFirst();
yybegin(state);
}
to give me the opportunity to track how many nested levels of comment I'm dealing with.
Unfortunately, this results in several COMMENT
tokens for one comment, because I have to match nested comment starts and comment ends.
Question: Is it possible with JFlex to use its API with methods like yypushback
or advance()
etc. to return exactly one token over the whole comment range, even if comments are nested?