How to get from parse tree to Java class file

Question

I am working on a command-line tool with the following functionality:

Parse modified .java files using an extended ANTLR4 Java9 grammar. The syntax in the files is Java, with one modification to the method declaration which includes a purpose, like in this example: public void {marketing} sendEmail() {}
Collect and remove all purposes using a visitor. Collection and analysis of the purposes is the main functionality of the program.
Compile and execute the Java files where the purposes are removed.

I am searching for the simplest and most effective way to achieve step 3. It is out of the scope of my project to build a full compiler, I would prefer to exploit the Java compiler and run javac if possible. I have considered the following approaches, but none seem optimal:

Prettyprinting (from parse tree to source code) as proposed in this post: Compiling an AST back to source code. It could be a lot of work on large directories though.
Use ASM to generate byte code, though as I understand I would need valid java source code or class files for this to work (https://asm.ow2.io/asm4-guide.pdf).
Build a Java compiler plugin, to modify the AST and remove purposes at the parse step in the compilation (https://www.baeldung.com/java-build-compiler-plugin). I am unsure if the compilation would fail before I can modify the AST because the syntax is not valid.

Any input is much appreciated.

Builds are very complicated, and are done in all sort of ways such as Maven, Ant, Gradle, Make. Javac itself looks at all source file even if you give it one .java file. I would recommend creating a tool that simply converts the purpose-annotated source to Java one file at a time, and fit it into the build tool that the user chose. The tool itself would parse source, nuke the Purpose nodes, then output source from the modified tree into a .java file. There is no "pretty-printing involved". Really trivial, e.g., "trparse src.pjava | trdelete '//purpose' | trtext > src.java". — kaby76, Mar 31 '22 at 14:50

score 1 · Accepted Answer · answered Apr 01 '22 at 08:05

You could use TokenStreamRewriter to get the source code without the purpose node (or accomplish many other rewriting tasks). Here's an example from an application where I conditionally add a top level LIMIT clause to a MySQL query:

/**
001     * Parses the query to see if there's already a top-level limit clause. If none was found, the query is
002     * rewritten to include a limit clause with the given values.
003     *
004     * @param query The query to check and modify.
005     * @param serverVersion The version of MySQL to use for checking.
006     * @param sqlMode The current SQL mode in the server.
007     * @param offset The limit offset to add.
008     * @param count The row count value to add.
009     *
010     * @returns The rewritten query if the original query is error free and contained no top-level LIMIT clause.
011     *          Otherwise the original query is returned.
012     */
013    public checkAndApplyLimits(query: string, serverVersion: number, sqlMode: string, offset: number,
014        count: number): [string, boolean] {
015
016        this.applyServerDetails(serverVersion, sqlMode);
017        const tree = this.startParsing(query, false, MySQLParseUnit.Generic);
018        if (!tree || this.errors.length > 0) {
019            return [query, false];
020        }
021
022        const rewriter = new TokenStreamRewriter(this.tokenStream);
023        const expressions = XPath.findAll(tree, "/query/simpleStatement//queryExpression", this.parser);
024        let changed = false;
025        if (expressions.size > 0) {
026            // There can only be one top-level query expression where we can add a LIMIT clause.
027            const candidate: ParseTree = expressions.values().next().value;
028
029            // Check if the candidate comes from a subquery.
030            let run: ParseTree | undefined = candidate;
031            let invalid = false;
032            while (run) {
033                if (run instanceof SubqueryContext) {
034                    invalid = true;
035                    break;
036                }
037
038                run = run.parent;
039            }
040
041            if (!invalid) {
042                // Top level query expression here. Check if there's already a LIMIT clause before adding one.
043                const context = candidate as QueryExpressionContext;
044                if (!context.limitClause() && context.stop) {
045                    // OK, ready to add an own limit clause.
046                    rewriter.insertAfter(context.stop, ` LIMIT ${offset}, ${count}`);
047                    changed = true;
048                }
049            }
040        }
051
052        return [rewriter.getText(), changed];
053    }

What is this code doing:

Line 017: the input is parsed to get a parse tree. If you have done that already, you can pass in the parse tree, of course, instead of parsing again.
Line 022 prepares a new TokenStreamRewriter instance with your token stream.
Line 023 uses ANTLR4's XPATH feature to get all nodes of a specific context type. This is where you can retrieve all your purpose contexts in one go. This would also be a solution for your point 2).
The following lines only check if a new LIMIT clause must be added at all. Not so interesting for you.
Line 046 is the place where you manipulate the token stream. In this case something is added, but you can also replace or remove nodes.
Line 052 contains probably what you are most interested in: it returns the original text of the input, but with all the rewrite actions applied.

With this code you can create a temporary java file for compilation. And it could be used to execute two actions from your list at the same time (collect the purposes and remove them).

How to get from parse tree to Java class file

1 Answers1