How to parse C# code then serialize it back to text file

Question

I found that there is C# antlr4 grammar. I build antlr4 parser for C# for that grammar. It works. I can walk the parse tree and see that there are some nodes that have some children.

Now I would like to generate the C# source code back from this parse tree.

Can I somehow generate (out of grammar) inverse of antlr parser that instead of parsing, when given parse tree will generate source code that would result in this parse tree?

EDIT:

My current attempt in CoffeeScript is to walk the source tree decorating it with pieces of source code using original source and start and stop positions that antlr puts in nodes, and then walk it again to print the source code. The only problem is that multiple nodes start at exactly same space in the source code. To deal with that I have some nasty logic to put source pieces only in the deepest node:

antlr = require 'antlr4'
{CSharp4Parser} = require './CSharp4Parser'
{CSharp4Lexer} = require './CSharp4Lexer'

input = "namespace A { class B {}; class C {} }"

cstream = new antlr.InputStream(input)
lexer = new CSharp4Lexer(cstream)
tstream = new antlr.CommonTokenStream(lexer)
parser = new CSharp4Parser(tstream)
parser.buildParseTrees = true ;

tree = parser.compilation_unit();

decorateWithSource = new antlr.tree.ParseTreeListener();

start =
  prev: null

stop =
  prev: null

o = (msg) -> process.stdout.write(msg)

decorateWithSource.enterEveryRule = (a) ->
  if start.prev
    start.prev.before = input.substr(start.prev.start.start, a.start.start - start.prev.start.start)

  if stop.prev
    stop.prev.after = input.substr(stop.prev.stop.start, a.start.start - stop.prev.stop.start)

  start.prev = a
  stop.prev = null

decorateWithSource.exitEveryRule = (a) ->
  if start.prev
    start.prev.before = input.substr(start.prev.start.start, a.stop.start - start.prev.start.start)

  if stop.prev
    stop.prev.after = input.substr(stop.prev.stop.start, a.stop.start - stop.prev.stop.start)

  start.prev = null
  stop.prev = a

walker = new antlr.tree.ParseTreeWalker();
walker.walk(decorateWithSource, tree);
stop.prev.after = input.substr(stop.prev.stop.start)

printOut = new antlr.tree.ParseTreeListener();
printOut.enterEveryRule = (a) ->
  o (a.before || ''), ' -> '+parser.ruleNames[a.ruleIndex]
printOut.exitEveryRule = (a) ->
  o (a.after || ''), ' < '+parser.ruleNames[a.ruleIndex]
walker.walk(printOut, tree);

What I'm trying to do is read C# source file (that comes from recompiled that botched some things) into a tree, then pass it through transformer written in OMeta (that narrows down my environment to languages that have OMeta implementation like C#, js or coffeescript, possibly others) and then write back the fixed source code.

Maybe manually walking parse tree to generate the code will be good enough for me.

If you're also writing in C#, wouldn't CodeDOM work for this? It's kind of built-in too. — Matti Virkkunen, May 15 '15 at 15:36
"CodeDOM" is synonym for "bad (granular) syntax tree". No, it isn't effective for C# parsed down to the level of tokens in expressions. — Ira Baxter, May 15 '15 at 16:41
Thanks for pointing me to CodeDOM. I couldn't find it myself. I'm not yet sure if it's granular enough for me but if is (and I need access to single operators and arguments in the expression) and you put it as an Answer I'd accept it. — Kamil Szot, May 15 '15 at 17:31
@Kamil: CodeDOM doesn't go down to the level of operators. As I said, bad syntax tree. — Ira Baxter, May 15 '15 at 18:02
It doesn't look good for CodeDOM... Someone on the internet wrote that it doesn't support language specific features like `foreach`. I think I'll need to just polish up my hacky solution. — Kamil Szot, May 15 '15 at 18:26
@Kamil: take a look also at this [topic](https://stackoverflow.com/questions/16338131/using-roslyn-to-parse-transform-generate-code-am-i-aiming-too-high-or-too-low) — Rudolf FERENC, May 18 '15 at 09:37

score 3 · Answer 1 · edited May 23 '17 at 12:33

3

Not easily; ANTLR isn't really designed to do this.

You can investigate StringTemplates, which will let you walk the tree and spit out code that is roughly right.

If you want to regenerate the source in good detail, this isn't enough. See my SO answer on How to build a prettyprinter.

edited May 23 '17 at 12:33

Community

1
1

answered May 15 '15 at 15:34

Ira Baxter

93,541
22
172
341

I might not necessarily need too detailed control over produced source code. I just need result to compile to same binary that the original source did. I can prettify result with other tools. – Kamil Szot May 15 '15 at 17:33
Still it would be great if someone came up with a tool for generating pretty printers from Antlr grammars. – Kamil Szot May 15 '15 at 17:37
1

Yep. Be great if that silent crowd of free labor added everything else needed to analyze and transform programs, too. Hasn't happened in the last 15 years. (Same thing is true of every other parsing engine on the planet). You can guess my prediction. See my essay on "Life After Parsing". – Ira Baxter May 15 '15 at 18:04
1

@Kamil if you only need the source to compile to the same binary, then walk the parse tree and spit out the *tokens* (ie only the leafs of this tree), with a space in between each token, and skip the comments, `#region` etc (I don't know how the grammar handles these). You'll get a one-liner with the same C# code that you've got on input. – Lucas Trzesniewski May 16 '15 at 12:35

How to parse C# code then serialize it back to text file

1 Answers1