ANTLR4 - Generate code from non-file inputs?

Question

Where do we start to manually build a CST from scratch? Or does ANTLR4 always require the lex/parse process as our input step?

I have some visual elements in my program that represent code structures.

e.g. a square represents a class, while a circle embedded within that square represents a method.

Now I want to turn those into code. How do I use ANTLR4 to do this, at runtime (using ANTLR4.js)? Most of the ANTLR examples seem to rely on lexing and parsing existing code to get to a syntax tree. So rather than:

input code->lex->parse->syntax tree->output code (1)

I want

manually create syntax tree->output code (2)

(Later, as the user adds code to that class and its methods, then ANTLR will be used as in (1).)

EDIT Maybe I'm misunderstanding this. Do I create some custom data structure and then run the parser over it? i.e. write structures to some in-memory format->parse->output code (3)?

ANTLR4 doesn't create any AST, it can only create a CST. If you want to build a CST yourself, what exactly prevents you from doing so? — Lucas Trzesniewski, Apr 17 '16 at 11:25
@LucasTrzesniewski I'm sorry, I'm new to this. By CST I presume you mean what ANTLR calls the parse tree. I wanted an AST because that would allow me to use the same source tree to read / write source in 2 languages (C & JS). But I don't need these at the same time... one project will be written in one _or_ the other... so nothing stops me from reading whatever the source for the current project is in (C _or_ JS) and then output accordingly. — Engineer, Apr 17 '16 at 13:31
@LucasTrzesniewski Can I safely assume that there is no way to manually build the tree in ANTLR4 (as opposed to 3)? And thus we must instead lex/parse some byte stream in order to build it? — Engineer, Apr 17 '16 at 13:34
Yes, a *concrete syntax tree* is just a synonym of a *parse tree*. If you want an AST with ANTLR4, you'll have to build one yourself, but that's pretty easy, I wrote a post about that [here](http://stackoverflow.com/questions/29971097/how-to-create-ast-with-antlr4/29996191#29996191). And if you want to stay at the CST level, nothing prevents you from building one yourself, that's exactly what ANTLR does under the hood. :) — Lucas Trzesniewski, Apr 17 '16 at 13:35
@LucasTrzesniewski Thanks a lot for the CST->AST example, that will be helpful. Can you give me a pointer on where to start if wanted to manually build a CST? Or do we always use the lex/parse process as our input step? I'm confused about whether ANTLR4 requires one always to engage in the parse process even if one only wants to build entirely new structures (i.e. there are no input files, yet). Once I know this, I can accept as the answer. — Engineer, Apr 17 '16 at 13:50
I've never used the JS version, but basically in the other languages, you can just instantiate the relevant classes and fill their properties, nothing prevents you from doing that. Maybe I misunderstood something about your question though. — Lucas Trzesniewski, Apr 17 '16 at 15:52
@LucasTrzesniewski You hit the nail on the head. JS version is functionally identical to the others, AFAIK. "The relevant classes" - that's what I'm looking for (being new to all this). I don't think there are any canon examples of this being done in v4. I'd probably have to reverse engineer a ParseTree and look at parser source to figure out what's going on / how to build my own. Was hoping someone could save me the trouble. — Engineer, Apr 17 '16 at 16:04
One of the lessons of parser generators is they don't provide a complete set of support for manipulating source code. For example, with ANTLR, if you want an arbitrary CST, you have to build it yourself. If you only do a little bit of this, this isn't much of an issue. If you intend to do a lot of tree building and composition, parser generator tools don't offer you any help. Other tools (very few) offer you means to build arbitrary AST/CSTs directly from pattern specifications, and will let you easily compose/combine them or use them for pattern matching. ... — Ira Baxter, Apr 17 '16 at 16:15
... "was hoping someone could save me the trouble"... As an example, see my tools pattern matching language: http://www.semdesigns.com/Products/DMS/DMSRewriteRules.html — Ira Baxter, Apr 17 '16 at 16:16
You might find this discussion of AST/CST useful: http://stackoverflow.com/q/1888854/120163 You have another problem you haven't mentioned yet: given an assembled AST/CST, how are you going to get valid text back? See this answer on how to prettyprint: http://stackoverflow.com/questions/5832412/compiling-an-ast-back-to-source-code/5834775#5834775 — Ira Baxter, Apr 17 '16 at 16:20
@ArcaneEngineer yes by the "relevant classes" I mean the CST nodes that ANTLR generates for you, and looking at the generated parser source code will give you an idea of how those are instantiated. As Ira says, ANTLR is a parser generator, not a complete toolkit, so if you want to generate source code from a CST, you'll probably have to write a code-outputting visitor yourself. While you *may* be able to write a CST back to source form, if you manipulated it you may get wrong results (for instance you'll need to insert additional parenthesis nodes to ensure correct operator precedence). — Lucas Trzesniewski, Apr 17 '16 at 16:31
Sometimes one has to ask completely wrong questions, before one can ask the right ones. I'm not saying I won't use ANTLR if I have no other choices, but is there anything that doesn't cost hundreds / thousands of dollars that can do all these things I need? — Engineer, Apr 17 '16 at 16:34
The simple answer is "No" (although people will accuse me of bias). Parser Generators simply aren't enough. See Life After Parsing: http://www.semdesigns.com/Products/DMS/LifeAfterParsing.html — Ira Baxter, Apr 17 '16 at 18:08
@IraBaxter I might just agree with those people ;) I think ANTLR can just about achieve what I want, given its code generation templates and the text it stores in the ParseTree nodes which can be used to regenerate in close-to-original form. In conjunction with your pretty printing answer, I think good progress could be made. Oh, I don't doubt there'll be many bumps along the way but it looks remotely doable. Maybe. — Engineer, Apr 17 '16 at 20:01
There's no question about whether people can build their own infrastructure with enough enthusiasm and time; that's exactly what I have done over 20 years. What happens in practice is they fail to have enough of both and then the whole exercise is a waste of energy; I cannot tell you how many times I have seen this, starting back in the 1980s (source of my bias). You may be a special case. Maybe. Best of luck. [PS: we never did discuss what your were going to do about the preprocessor for C; that's another really complex issue]. — Ira Baxter, Apr 17 '16 at 20:11

GRosenberg · Accepted Answer · 2016-04-17T17:50:07.610

3

IIUC, you could use StringTemplate directly.

By, way of background, Antlr itself builds an in-memory parse-tree and then walks it, incrementally calling StringTemplate to output code snippets qualified by corresponding parse-tree node data. That Antlr uses an internal parse-tree is just a convenience for simplifying walking (since Antlr is built using Antlr).

If you have your own data structure, regardless of its specific implementation, procedurally process it to progressively call ST templates to emit the corresponding code. And, you can directly use the same templates that Antlr uses (JavaScript.stg), if they meet your requirements.

Of course, if your data structure is of a nature that can be lex'd/parsed into a standard Antlr parse-tree, you can then use a standard Antlr visitor to call and populate node-specific templates.

edited Apr 17 '16 at 17:50

answered Apr 17 '16 at 17:43

GRosenberg

5,843
2
19
23

This led me on a merry chase during which I discovered there is no C.stg for ANTLR4 (there is for ANTLR3; I am still considering that option). Writing an .stg is a particularly hardcore task that is not recommended for newbies to either ANTLR4 or the code gen target language. So it's ANTLR3 or some other way. Thanks. Also, what did you mean by "node-specific templates"? – Engineer Apr 17 '16 at 19:57
A node in your data-structure corresponding to a square would correlate to a class template; circle to a method template. The C.stg is not specific to Antlr3 or 4. It is specific to the version of StringTemplate. Don't think the syntax of StringTemplate has changed substantively in some time. – GRosenberg Apr 17 '16 at 20:09
Er, am I to understand that ANTLR works with templates at these levels? Would be good to have some sort of _handle_ for this, if so, so that I can go and look it up. – Engineer Apr 17 '16 at 20:10
1

StringTemplate.org – GRosenberg Apr 17 '16 at 20:12

ANTLR4 - Generate code from non-file inputs?

1 Answers1

Linked