3

I'm parsing SQL using the excellen JSQLParser library, which uses JavaCC internally.

The grammar consists of TOKENs and SPECIAL_TOKENs. The latter is used to remove the single and multi-line comments from the token stream before the parser is called, like this:

SPECIAL_TOKEN:
{
   < LINE_COMMENT: ("--" | "//") (~["\r","\n"])*>
|  < MULTI_LINE_COMMENT: "/*" (~["*"])* "*" ("*" | (~["*","/"] (~["*"])* "*"))* "/">
}

I can use the AST to find all the SPECIAL_TOKENS by just using .next on the root node and resulting nodes, but then I lose the structure. This gives me just the contents without the parse-context.

I would like to use the context to implement a code-formatter.

My example:

                -- 1. this is 
                -- 2. an example 

                SELECT * /* All cols */  FROM aap -- two
                   JOIN b ON a.c=d.c
                   WHERE /* inline comment */ true
                -- an example
               ;

I want it to be formatted somewhat like this:

                -- 1. this is 
                -- 2. an example 

                SELECT 
                    * /* All cols */  
                FROM 
                    aap -- two
                JOIN 
                    b ON a.c=d.c
                WHERE /* inline comment */ true
                -- an example
                ;

What is the correct approach using javacc?

Rob Audenaerde
  • 19,195
  • 10
  • 76
  • 121
  • The tricky part is for the last comment `-- an example`: is it part of the SELECT query or not? – Maurice Perry Sep 19 '22 at 07:23
  • @MauricePerry the rest is not trivial either, imho. I added a `;` to be explicit about the SELECT statement. – Rob Audenaerde Sep 19 '22 at 07:50
  • Actually, the special tokens are linked to the token that follows, so it's not that of a problem. – Maurice Perry Sep 19 '22 at 07:57
  • Yes, that is while you are travesing the AST. But the objects that follow from parse-rules don't have this information as far as I can tell. So either I can use the AST, where I have to do all the parsing myself, or I use the Object Tree from the `parseStatment`, where I lose the special tokens. I was wondering if there was some way to combine them? – Rob Audenaerde Sep 19 '22 at 08:05
  • Are you using this: https://github.com/JSQLParser/JSqlParser ? – Maurice Perry Sep 19 '22 at 08:31
  • Yes. I have also looked at the approach in https://github.com/manticore-projects/jsqlformatter, but that seems to extract comments and later insert them again, which feels a bit laborious – Rob Audenaerde Sep 19 '22 at 08:33
  • I see your dilema. Note that if you decide for the latter (`parseStatement`), you may want to implement a `CommonTokenAction` method to deal with the special tokens. – Maurice Perry Sep 19 '22 at 09:09
  • How would that help in the parse rules? – Rob Audenaerde Sep 20 '22 at 07:20
  • You may want to authorize comments only at certain places. – Maurice Perry Sep 20 '22 at 07:48

0 Answers0