use JavaCC to format my input file; how to handle comments?

Question

I'm parsing SQL using the excellen JSQLParser library, which uses JavaCC internally.

The grammar consists of TOKENs and SPECIAL_TOKENs. The latter is used to remove the single and multi-line comments from the token stream before the parser is called, like this:

SPECIAL_TOKEN:
{
   < LINE_COMMENT: ("--" | "//") (~["\r","\n"])*>
|  < MULTI_LINE_COMMENT: "/*" (~["*"])* "*" ("*" | (~["*","/"] (~["*"])* "*"))* "/">
}

I can use the AST to find all the SPECIAL_TOKENS by just using .next on the root node and resulting nodes, but then I lose the structure. This gives me just the contents without the parse-context.

I would like to use the context to implement a code-formatter.

My example:

                -- 1. this is 
                -- 2. an example 

                SELECT * /* All cols */  FROM aap -- two
                   JOIN b ON a.c=d.c
                   WHERE /* inline comment */ true
                -- an example
               ;

I want it to be formatted somewhat like this:

                -- 1. this is 
                -- 2. an example 

                SELECT 
                    * /* All cols */  
                FROM 
                    aap -- two
                JOIN 
                    b ON a.c=d.c
                WHERE /* inline comment */ true
                -- an example
                ;

What is the correct approach using javacc?

The tricky part is for the last comment `-- an example`: is it part of the SELECT query or not? — Maurice Perry, Sep 19 '22 at 07:23
@MauricePerry the rest is not trivial either, imho. I added a `;` to be explicit about the SELECT statement. — Rob Audenaerde, Sep 19 '22 at 07:50
Actually, the special tokens are linked to the token that follows, so it's not that of a problem. — Maurice Perry, Sep 19 '22 at 07:57
Yes, that is while you are travesing the AST. But the objects that follow from parse-rules don't have this information as far as I can tell. So either I can use the AST, where I have to do all the parsing myself, or I use the Object Tree from the `parseStatment`, where I lose the special tokens. I was wondering if there was some way to combine them? — Rob Audenaerde, Sep 19 '22 at 08:05
Are you using this: https://github.com/JSQLParser/JSqlParser ? — Maurice Perry, Sep 19 '22 at 08:31
Yes. I have also looked at the approach in https://github.com/manticore-projects/jsqlformatter, but that seems to extract comments and later insert them again, which feels a bit laborious — Rob Audenaerde, Sep 19 '22 at 08:33
I see your dilema. Note that if you decide for the latter (`parseStatement`), you may want to implement a `CommonTokenAction` method to deal with the special tokens. — Maurice Perry, Sep 19 '22 at 09:09

use JavaCC to format my input file; how to handle comments?

0 Answers0