15

I've scoured the internet looking for some newbie information on developing a C# Abstract Syntax Trees but I can only find information for people already 'in-the-know'. I am a line-of-business application developer so topics like these are a bit over my head, but this is for my own education so I'm willing to spend the time and learn whatever concepts are necessary.

Generally, I'd like to learn about the techniques behind developing an abstract representation of code from a code string. More specifically, I'd like to be able to use this AST to do C# syntax highlighting. (I realize that syntax highlighting doesn't necessary need an AST, but this seems like a good opportunity to learn some "compiler"-level techniques.)

I apologize if this question is a bit broad, but I'm not sure how else to ask.

Thanks!

svick
  • 236,525
  • 50
  • 385
  • 514
Vince Fedorchak
  • 1,145
  • 4
  • 11
  • 19
  • FWIW, if you want a good place to start on compilers, the dragon book is (IMHO) a great book. http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools – James Manning May 21 '12 at 02:12

3 Answers3

21

First you need to understand what parsing is, and what abstract syntax trees are. For this, you can consult Wikipedia on abstract syntax trees for a first look.

You really need to spend some time with a compiler text book to understand how abstract syntax trees are related to parsing, and can be constructed while parsing; the classic reference is Aho/Ullman/Sethi's "Compilers" book (easily found on the web). You may find the SO answer to Are there any "fun" ways to learn about Languages, Grammars, Parsing and Compilers? instructive.

Once you understand how to build an AST for a simple grammar, you can then turn your attention to something like C#. The issue here is sheer scale; it is one thing to play with a toy language with 20 grammar rules. It is another to work with grammar of several hundred or a thousand rules. Experience will small ones will make it a lot easier to understand how the big ones are put together, and how to live with them.

You probably don't want to build your own C# grammar (or implement the one from the C# standard); its quite a lot of work. You can get available tools that will hand you C# ASTs (Roslyn has already been mentioned; ANTLR has a C# parser, there are many more).

It is true that you might use an AST for syntax highlighting (although that is probably killing a gnat with a sledgehammer). What most people don't think much about (but the compiler books emphasize), is what happens after you have an AST; mostly they aren't useful by themselves. You actually need a lot more machinery to do anything interesting. Rather than repeat this over and over (I keep seeing the same kind of questions), you can see my discussion on Life After Parsing for more details.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 1
    I know this is a bit late, but have you looked at [GOLD Parser](http://goldparser.org/)? This program allows you to construct a grammar using BNF rules and gan generate skeleton code in any language to process the parse tree, i.e. interpret code as you walk the parse tree or generate code. – Intrepid Nov 19 '12 at 21:25
  • @Mike Clarke: I thought GOLD parsed, only. It actually builds a parse tree? There's no evidence of this that I can see from the web pages http://goldparser.org/doc/index.htm – Ira Baxter Nov 19 '12 at 21:35
6

You should probably take a look at this talk by Phil Trelford:

Write your own compiler in 24 hours

This man is a genius, and will leave you fired up to learn about compilers. He explains it literally easily enough for a five year old to understand. The five year old in question is his son, so probably has an unfair advantage, but five is five.

Jonathan
  • 25,873
  • 13
  • 66
  • 85
1

Take a look at Roslyn. I think it could be what you're looking for. It gives you access to the compilers AST, among lots of other amazing things!

http://blogs.msdn.com/b/visualstudio/archive/2011/10/19/introducing-the-microsoft-roslyn-ctp.aspx

Beyond that, I suggest a textbook on compilers.

svick
  • 236,525
  • 50
  • 385
  • 514
  • I think Roslyn is not a good example of an *abstract* syntax tree. Its syntax tree contains every semicolon, comment and whitespace, which makes it a very concrete syntax tree. But if syntax highlighting was the goal, Roslyn would be a good choice. – svick May 21 '12 at 00:15
  • Any particular textbook you could recommend? I'm not really looking for a ready-made solution, I'm looking to edify myself by developing my own. – Vince Fedorchak May 21 '12 at 00:18