2

I have modified the PLSQL parser given by [Porcelli] (https://github.com/porcelli/plsql-parser). I am using this parser to parse PlSql files. After successful parsing, I am printing the AST. Now, I want to edit the AST and print back the original plsql source with edited information. How can I achieve this? How can I get back source file from AST with comments, newline and whitespace. Also, formatting should also be remain as original file. Any lead towards this would be helpful.

Mohit Chawda
  • 53
  • 1
  • 1
  • 8
  • Aesthetic new lines/whitespace and comments will be lost in the parse; you won't get these back from the AST alone. – Xophmeister Jun 05 '14 at 12:50
  • @Xophmeister: there's lots of stuff you can't get back from an absolutely pure AST. If you want to get it back, the AST has to carry some additional information, which can be collected while parsing. See my answer. – Ira Baxter Jun 05 '14 at 14:00

2 Answers2

2

The simple answer is "walk the tree, and spit out text that corresponds to the nodes". ANTLR offers "StringTemplates" as a basic kind of help, but in fact there's a lot of fine detail that needs to be addressed: indentation, literals and their formats, comments,...

See my SO answer on Compiling an AST back to source code for a lot more detail.

One thing not addressed there is the general need to reproduce the original character encoding of the file (if you can, sometimes you can't, e.g., you had an ASCII file but inserted a string containing a Unicode character).

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
2

Each node in an AST comes with an index member which gives you the token position in the input stream (token stream actually). When you examine the indexes in your AST you will see that not all indexes appear there (there are holes in the occuring indexes). These are the positions that have been filtered out (usually the whitespaces and comments).

Your input stream however is able to give you a token at a given index and, important, to give you every found token, regardless of the channel it is in. So, your strategy could be to iterate over the tokens from your token stream and print them out as they come along. Additionally, you can inspect your AST for the current index and see if instead a different output must be generated or additional output must be appended.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • The question is "How to print source from the AST". You say, "iterate over the tokens..."; OP reasonably assumes these are gone. – Ira Baxter Jun 08 '14 at 17:09
  • 1
    @IraBaxter The OP wants to regenerate the original source code + some changes added to the AST. Read between the lines. Well, it's so obvious, you don't need to read between lines actually... – Mike Lischke Jun 09 '14 at 10:19