Does the Antlr4 generated code include anything like an unparser that can use the grammer and the parser tree to reconstruct the original source? How would I invoke that if it exists? I ask because it might be useful in some application and debugging.
2 Answers
It really depends what do you want to achieve. Remember that Lexer tokens which are put onto HIDDEN channel (like comments and which spaces) and are not parsed at all. The approach I used was
- use additional user specific information in lexer token class
- parse the source and get AST
- rewind the lexer(token source) and loop over all Lexem-es, including the hidden ones
- for each hidden Lexeme, append the reference to the corresponding AST leaf
- so every AST leaf "know" which white-space Lexemes are following it
- recursively traverse the AST and print all the Lexemes

- 4,903
- 1
- 20
- 35
-
This is true if you use `-> channel(HIDDEN)`, but not if you use `-> skip`. The `skip` command completely suppresses token generation for a lexer rule, so the token stream will not contain any information about the text of those rules. – Sam Harwell Nov 04 '13 at 13:45
-
I think I understand this but is there an example on how to append these white space nodes to the Parse tree? How would you synchronize the token stream that still contained the HIDDEN tokens with the parse tree to know where to add the missing tokens? – Neil Pittman Nov 12 '13 at 21:13
-
1I did it in C++, I assume that you need Java example. Look at these links: http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844 or look at TokenLabelType grammar option http://www.antlr.org/wiki/display/ANTLR3/Grammar+options. First thing you have to do is to subclass Token class and add additional attributes into it. There are at least two ways how to do that. – ibre5041 Nov 13 '13 at 08:46
-
Yes, you can regenerate source text. It is unclear why you'd bother; you *already* have it; after all, you built the parse tree from something. The more interesting case is how to do this after you have *modified* the tree; now all of that "background" information is untrustworthy. For a general solution, you need to build a *prettyprinter*; See http://stackoverflow.com/a/5834775/120163 – Ira Baxter Apr 23 '15 at 09:05
Yes! ANTLR's infrastructure (usually) makes the original source data available.
In the default case, you will be using a CommonTokenStream. This inherits from BufferedTokenStream, which offers a whole slew of methods for getting at stuff.
Methods getHiddenTokensOnLeft (and ...Right) will get you lists of tokens not appearing in the DEFAULT stream. Those tokens will reveal their source text using getText().
What I find even more convenient is BufferedTokenStream.getText(interval), which will give you the text (including hidden) on an Interval, which you can get from your tree element (RuleContext).
To make use of your CommonTokenStream and its methods, you just need to pass it from where you create it and set up your parser to whatever class is examining the parse tree, such as your XXXBaseListener - I just gave my Listener a constructor that stores the CommonTokenStream as an instance field.
So when I want the complete text for a rule ctx, I use this little method:
String originalString(ParserRuleContext ctx) {
return this.tokenStream.getText(ctx.getSourceInterval());
}
Alternatively, the tokens also contain line numbers and offsets, if you want to fiddle with those.

- 66,391
- 18
- 125
- 167