164

I looked at the source code at http://referencesource.microsoft.com/, and it appears all the source code is in C#.

I also looked at the source code for the new C# compiler platform (Roslyn), and it is also in C#. How is that possible? Is C# language compiler written in C#? Or am I missing something obvious? If C# compiler is written in C# then how does it work?

Shahzeb
  • 4,745
  • 4
  • 27
  • 40
CriketerOnSO
  • 2,600
  • 2
  • 15
  • 24
  • 15
    Many compilers are written in the language they compile - Google [bootstrapping](http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29) to learn more. – Paul Roub Dec 16 '14 at 20:24
  • 20
    I think the _original_ compiler was written in C++. – Arian Motamedi Dec 16 '14 at 20:25
  • 47
    Well, a hammer can be forged by using another hammer. Previous version of it... – Eugene Sh. Dec 16 '14 at 20:25
  • By using a spec. And backwards compatibility. – Peter Dec 16 '14 at 20:25
  • it gets compiled into IL – Sam I am says Reinstate Monica Dec 16 '14 at 20:25
  • 12
    The link you posted is the link to the source code of the Framework library, not to the compiler. – Steve Dec 16 '14 at 20:26
  • 10
    Possibly related: [Implementing a compiler in “itself”](http://stackoverflow.com/questions/193560/implementing-a-compiler-in-itself) and [Bootstrapping a language](http://stackoverflow.com/questions/193560/implementing-a-compiler-in-itself) – Habib Dec 16 '14 at 20:58
  • It's not nearly as mind blowing as something like a self-hosted JVM implementation written in Java (JikesRVM). – SK-logic Dec 17 '14 at 02:02
  • @SK-logic: AFAIK, JikesRVM is basically a statically compiled VM, which just happens to be written in Java. What is more mindblowing IMO, is something like the Maxine RVM, which runs inside of itself, compiling itself with its own dynamic JIT compiler while it is running. So, in Jikes, there is still a clear separation between compiling the VM and running the VM, at least as far as I understand it. – Jörg W Mittag Dec 17 '14 at 16:10
  • I'm pretty sure for most popular languages there are compilers written in that language. – iFreilicht Dec 17 '14 at 17:10
  • I remember being blown away by code like this in a Lisp interpreter: `(defun car (cons) (car cons))`. It looks like infinite recursion, but it isn't, because of open-coding in the compiler. – Barmar Dec 23 '14 at 22:49

4 Answers4

250

The original C# compiler wasn't written in C#, it was in C and C++. The new Roslyn compiler was written in C#, but was initially compiled with the old compiler. Once the new compiler was done, it was able to compile its own source code: this is called bootstrapping.

Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • 3
    So when a change has to be made to the "original compiler", does that has to be compiled with the old compiler *(written in C,C++)* ? – CriketerOnSO Dec 16 '14 at 20:31
  • 12
    There would be no need to change the "original compiler" the newer versions would be modified – Pseudonym Dec 16 '14 at 20:35
  • 1
    @CriketerOnSO, the new compiler will replace the old one, so there will be no need to modify the old one. But if MS wanted to do that, they would recompile the old compiler with a C++ compiler, as they did before. – Thomas Levesque Dec 16 '14 at 20:40
  • 3
    @ThomasLevesque Self-hosting is the end result of boot-strapping. – arx Dec 17 '14 at 00:19
  • @CriketerOnSO: No, of course not. How would it even be possible to compile the old compiler with the old compiler? The compiler is for C#, but it is written in C++, you need a C++ compiler to compile it. – Jörg W Mittag Dec 17 '14 at 06:02
  • 1
    The same is applied to C/C++ compilers, which have a little 'bootstrapping' in assembly which compiles a little subset of C, so it compiles another subset of C increasing support for some 'high-level' programming, and so on... until reaching the last/current subset of the compiler, which can be tested compiling it self. This is and can be used in any kind of language. – Luciano Dec 24 '14 at 13:56
  • When new keywords are added to the language, how it will be compiled? because we don't have the compiler(yet) which understands the new tokens ? – Sriram Sakthivel Apr 14 '15 at 11:36
  • 2
    @SriramSakthivel, the code of the compiler can't use the new keywords, at least not until there is a compiler that understands them. You always use an older version of the compiler to build the new one. – Thomas Levesque Apr 14 '15 at 13:08
34

Compilers are utility programs - they turn programming language text into machine code. If the programming language describes software that just happens to be a compiler.....

Compilers can also produce machine code for other architectures. For example, Apple compiles iOS using racks of Intel-based servers. The compiler does not have to run the ARM code it generates, just write it to disk.

Compiler 2.0 must be written in a language compiler 1.0 can process, but it can certainly create compiler 2.0 with newer features like optimization. You can then re-compile the source code using compiler 2.0 and make a better version of itself. Again, the compiler doesn't know it's making another version of itself.

If we go far enough back into the mists of time then we do reach a point where we have no compiler - the very first iteration of a high-level language. Then we have to get out the pencils and opcode books and write the first one in assembly. How did we write the first assembler? Direct machine code entry, probably on punched paper tape, or flipping switches on the front panel.

paul
  • 341
  • 2
  • 2
  • 11
    And the paper tape is just flipping switches via holes in the paper. :-) – Zan Lynx Dec 17 '14 at 00:11
  • 3
    Paper tape as a storage technology will *never* take off. It's just too complex and error-prone, plus it burns easily if there is a short circuit in the reader and that will completely destroy your program. – user Dec 18 '14 at 08:33
17

A compiler is just a program like any other program. There is nothing magical or special about it. It takes some input and produces some output. In this particular case, the input just happens to be C# and the output just happens to be CIL, but that's no different from the input being a series of tax returns and the output being a report.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
0

You write a language in whatever is available and create a new compiler for it. Now this program we can call it C# Compiler V 1.0 is able to read and compile any C# code with current set of reserved words. Now, you say, well I want to introduce a new feature that did not exist before, like where statement. Ok, you use C# Compiler V 1.0 which obviously does not have where statement anywhere and compile a code into a new version C# Compiler V 2.0.

You may ask here: but wait, there is no where statement in C# Compiler V 1.0. Now, a compiler is such beast that it does a very specific job for which you do not need more than 20% of what C# can offer anyway. Sure, it is sometimes tricky to think about new features like yield, but unless yield is expressed in simpler terms, you would not be able to implement it easily anyway regardless of what compiling language you use.

Once your C# Compiler V 2.0 is created, even though you do not need where statement and it is maybe not even used anywhere in the Code for C# Compiler V 2.0, you would still recompile it with your new compiler and this C# Compiler V 2.0 produced from the Code for C# Compiler V 2.0 by C# Compiler V 2.0 is your New C# Compiler V 2.0 compiler.

Before you do this since your new compiler can understand new syntax you are entitled to adjust the compiler code itself and add anything that can be compiled into it, if you think that it will improve anything. However, it is a small chance that a new syntax can improve the compiler itself.