Looking for a C# code parser

Question

I'm looking for a set of classes (preferably in the .net framework) that will parse C# code and return a list of functions with parameters, classes with their methods, properties etc. Ideally it would provide all that's needed to build my own intellisense.

I have a feeling something like this should be in the .net framework, given all the reflection stuff they offer, but if not then an open source alternative is good enough.

What I'm trying to build is basically something like Snippet Compiler, but with a twist. I'm trying to figure out how to get the code dom first.

I tried googling for this but I'm not sure what the correct term for this is so I came up empty.

Edit: Since I'm looking to use this for intellisense-like processing, actually compiling the code won't work since it will most likely be incomplete. Sorry I should have mentioned that first.

Does it have to work with non-complete code or code with errors (i.e. code that wouldn't compile with the normal compiler?) That's typically a requirement for IntelliSense-style parsers. — Dean Harding, Mar 15 '10 at 05:03
Yea it should work on incomplete code. I'm looking for on-the-fly stuff. — Blindy, Mar 15 '10 at 05:07

score 5 · Accepted Answer · edited May 23 '17 at 12:13

5

While .NET's CodeDom namespace provides the basic API for code language parsers, they are not implemented. Visual Studio does this through its own language services. These are not available in the redistributable framework.

You could either...

Compile the code then use reflection on the resulting assembly
Look at something like the Mono C# compiler which creates these syntax trees. It won't be a high-level API like CodeDom but maybe you can work with it.

There may be something on CodePlex or a similar site.

UPDATE
See this related post. Parser for C#

edited May 23 '17 at 12:13

Community

1
1

answered Mar 15 '10 at 05:05

Josh

68,005
14
144
156

+1 for update too - it contains workable solutions – John K Mar 15 '10 at 05:42

score 2 · Answer 2 · answered Mar 15 '10 at 05:14

If you need it to work on incomplete code, or code with errors in it, then I believe you're pretty much on your own (that is, you won't be able to use the CSharpCodeCompiler class or anything like that).

There's tools like ReSharper which does its own parsing, but that's prorietary. You might be able to start with the Mono compiler, but in my experience, writing a parser that works on incomplete code is a whole different ballgame to writing one that's just supposed to spit out errors on incomplete code.

If you just need the names of classes and methods (metadata, basically) then you might be able to do the parsing "by hand", but I guess it depends on how accurate you need the results to be.

Yea I'm beginning to consider parsing it by hand. Not sure how difficult this will be with generics though. — Blindy, Mar 15 '10 at 05:17

score 2 · Answer 3 · answered Mar 15 '10 at 18:04

2

Mono project GMCS compiler contains a pretty reusable parser for C#4.0. And, it is relatively easy to write your own parser which will suite your specific needs. For example, you can reuse this: http://antlrcsharp.codeplex.com/

answered Mar 15 '10 at 18:04

SK-logic

9,605
1
23
35

The problem with these already-made parsers is that they won't work for incomplete (and thus invalid) code. Their purpose is to create a syntax tree detailed enough to generate code, not to provide data for intellisense. – Blindy Mar 15 '10 at 18:30
Yep. But, as they are reusable, one can easily tweak them. ANTLR may be used for a partial parsing. But of course the most generic option is PEG, so if you can get hold on a decent PEG implementation for .NET, and you can port an existing, say, ANTLR parser, you'll get a quick and easy generic solution. For example, a Packrat parser from http://www.meta-alternative.net/mbase.html is capable of generating syntax highlighting modes for a text editor, out of any generic syntax, and it work well with incomplete or invalid input. – SK-logic Mar 15 '10 at 18:50

score 1 · Answer 4 · answered Mar 15 '10 at 05:08

Have a look at CSharpCodeCompiler in Microsoft.CSharp namespace. You can compile using CSharpCodeCompiler and access the result assembly using CompilerResults.CompiledAssembly. Off that assembly you will be able to get the types and off the type you can get all property and method information using reflection.

The performance will be pretty average as you will need to compile all the source code whenever something changes. I am not aware of any methods that will let you incrementatlly compile snippets of code.

score 1 · Answer 5 · edited Apr 15 '23 at 14:01

1

Have you tried using the Microsoft.CSharp.CSharpCodeProvider class? This is a full C# code provider that supports CodeDom. You would simply need to call .Parse() on a text stream, and you get a CodeCompileUnit back.

var codeStream = new StringReader(code);
var codeProvider = new CSharpCodeProvider();

var compileUnit = codeProvider.Parse(codeStream);

// compileUnit contains your code dom

Well, seeing as the above does not work (I just tested it), the following article might be of interest. I bookmarked it a good long time ago, so I believe it only supports C# 2.0, but it might still be worth it:

Generate Code-DOMs directly from C# or VB.NET

edited Apr 15 '23 at 14:01

Glorfindel

21,988
13
81
109

answered Mar 15 '10 at 05:11

jrista

32,447
15
90
130

This is not implemented by any of the code dom providers and throws a NotImplementedException. – Josh Mar 15 '10 at 05:14
@Josh: Seems you are correct. I just tried, and it does indeed fail. Such a bummer. – jrista Mar 15 '10 at 05:22

score 1 · Answer 6 · answered Oct 03 '11 at 23:30

It might be a bit late for Blindy, but I recently released a C# parser that would be perfect for this sort of thing, as it's designed to handle code fragments and retains comments: C# Parser and CodeDOM

It handles C# 4.0 and also the new 'async' feature. It's commercial, but is a small fraction of the cost of other commercial compilers.

I really think few people realize just how difficult parsing C# has become, especially if you need to resolve symbolic references properly (which is usually required, unless maybe you're just doing formatting). Just try to read and fully understand the Type Inference section of the 500+ page language specification. Then, meditate on the fact that the spec is not actually fully correct (as mentioned by Eric Lippert himself).

Looking for a C# code parser

6 Answers6

Linked