Is it always possible to go from AST to original source code?

Question

JavaScript source code can be converted to an AST. I am using SHIFT AST Parser to create AST from JavaScipt code.

Now I want to convert the generated AST back to source code.

I am very much confused here and trying to understand the fundamentals. I am hearing from my colleagues that AST can't be converted back to the source code. But for what reason?

One of colleague told me AST does not preserve the spacing and indentation will be lost while converting AST to source code.

Is it the only reason?

recovering spacing and indentation is definitely not going to stop you - a decent editor can format javascript in a heartbeat — Bravo, Aug 14 '21 at 01:12
@Bravo so my colleagues comment is correct that indentation is not preserved in AST? — Exploring, Aug 14 '21 at 01:16
But discarded comments cannot be recovered. Besides, some people want to preserve the original spacing and indentation for reasons. For example, tools like `git diff` are difficult to use effectively if spacing / indentation is changed. — Stephen C, Aug 14 '21 at 01:16
Yes. Your colleagues comment is correct. (So is that the only reason you asked this? Because you don't believe your colleague? But what if we are all wrong too??? ) — Stephen C, Aug 14 '21 at 01:16
@StephenC thanks - so indentation and comment is the only limitation? — Exploring, Aug 14 '21 at 01:17
Babeljs can preserve comments - I'm assuming the "AST" referenced in babeljs is the same "AST" being asked about — Bravo, Aug 14 '21 at 01:17
Well ... yes ... it all depends on what AST implementation you are talking about. It is possible to design an AST to preserve absolutely everything from the original source code. — Stephen C, Aug 14 '21 at 01:19
@Exploring - no, they aren't a limitation at all. Depends on what you mean by "original source" - if you want every bad indentation, odd spacing, byte for byte from the original code, then no, you probably can't reproduce that - I guess the question I have is - why do you want to? You can, of course, produce valid source code, just no the original source code — Bravo, Aug 14 '21 at 01:23
I feel like this is an AB question. When I think AST, I'm thinking of a temporary structure that is used to transform to something else. Do you want to *just* keep the AST? Are you getting rid of the source? Why do you want to go from AST to source? Some implementations may make this possible, but that isn't really what I think its meant for. What is the reasoning? — zero298, Aug 14 '21 at 01:48
Of course not. Even a CST traditionally does not retain much of the token stream (white space and comments). An AST is a tree rewrite of the CST, where traditionally even less information is kept. The rewrite function may not be one-to-one, further making source reconstruction a problem. Suppose you have a language that has keywords that are case insensitive, so keyword "func" and "Func" are equivalent, but where the AST no longer keeps the token because the AST only denotes that we are defining a function. The problem is exacerbated with preprocessing, a pipeline of parsers/reconstructors. — kaby76, Aug 14 '21 at 15:03
This SO answer tells you precisely how to reconstruct source text from CSTs/ASTs: https://stackoverflow.com/a/5834775/120163 — Ira Baxter, Aug 19 '21 at 11:52

Stephen C · Answer 1 · 2021-08-14T02:16:54.677

First of all, it depends on what you mean by "original source" code.

If you mean the exact same file on the exact same file system that you were editing when you wrote the software, the answer is that you can't. Obviously.
If you mean code which is character for character identical to the code you wrote, it is technically possible but unlikely in practice.
If you mean code that works exactly the same and "looks mostly the same", then yes, you can. (Depending on "mostly".)

The answer also depends on what AST implementation you are talking about.

Some AST implementations don't preserve comments or spacing / indentation.
Other AST implementations apparently can preserve comments; e.g. as decorations on the tree nodes.
It is theoretically possible for an AST implementation to preserve absolutely everything needed to reconstruct identical source code. (But I don't know of an example that does. It would be memory expensive and kind of pointless.)

What is the harm in not being able to recover comments?

Well it depends on what you want to use the regenerated source code for. If you intend to be able to replace the original code, then there are clear problems:

You have lost any (hopefully) useful comments that the programmer included to help people to understand the code.
It is common to embed formal API documentation in the form of stylized source code comments that are then extracted, formatted, etc. If those comments are lost, it becomes harder to keep the API documentation up to date.
Some 3rd-party tools use stylized comments for specific purposes. For example, a comment could be could be used to suppress a false positive from a static code analyzer; e.g. a # noqa comment in Python code suppresses a pep8 style error.

On the other hand ... this kind of thing may not be relevant for your use-case.

Now from the tags I deduce that you are using Shift-AST. From a brief scan of the documentation and source code, I don't think this preserves either comments or indentation / spacing.

So that means that you cannot recover source code that is character for character identical with the original code. If that is what you want ... your colleague is 100% correct.

However, character for character identical code may not be necessary, so this may not be a limitation. It depends on your use-case.

And you could investigate Babel as an alternative. Apparently it can preserve comments.

One of colleague told me AST does not preserve the spacing and indentation will be lost while converting AST to source code. Is it the only reason?

Clearly, No. (As my answer explains.)

"*I don't know of an example that does. It would be memory expensive and kind of pointless*" - linters and autoformatters use these. However, it's no longer called an *abstract* syntax tree but rather a [parse tree](https://stackoverflow.com/q/1888854/1048572) or [concrete syntax tree](https://stackoverflow.com/q/1888854/1048572) — Bergi, Aug 14 '21 at 05:22

score 1 · Answer 2 · answered Aug 14 '21 at 01:41

1

Nope. An abstract syntax tree is abstract in the way that it abstracts away ambiguous grammar, such as whitespace and possibly also comments (if these are irrelevant to further processing). As there is usually no purpose in storing this information, it is worth dropping during parsing.

While one can't go back to "original source code", one can still go back to an equivalent representation which is usually called the canonical form.

answered Aug 14 '21 at 01:41

Jonas Wilms

132,000
20
149
151

1

Also grouping operators, semicolons, trailing commas, etc. – Bergi Aug 14 '21 at 05:14

score 0 · Answer 3 · answered Aug 14 '21 at 02:04

Well, it's possible. If you're using shift-ast you can do it.

Step 1:

npm install shift-codegen

Step 2:

import codegen from "shift-codegen";
let programSource = codegen(/* Shift format AST */);

ProgramSource will return string. Write it to your file and use prettier to format your code.

There is an alternative to shift-ast is called babel gives so many benefits, transforms and template features. Also provides typescript, flow, jsx and comments and minification features.

Is it always possible to go from AST to original source code?

3 Answers3