184

I am researching CoffeeScript on the website http://coffeescript.org/, and it has the text

The CoffeeScript compiler is itself written in CoffeeScript

How can a compiler compile itself, or what does this statement mean?

nbro
  • 15,395
  • 32
  • 113
  • 196
AlexanderRD
  • 2,069
  • 2
  • 11
  • 19
  • 14
    Another term for a compiler that can compile itself is a `self-hosting` compiler. See http://programmers.stackexchange.com/q/263651/6221 – oɔɯǝɹ Jun 18 '16 at 23:01
  • 48
    There are at least two copies of the compiler involved. A pre-existing one compiles a new copy. The new one may or may not be identical to the old one. – bdsl Jun 19 '16 at 09:37
  • 13
    You may also be interested in Git: its source code is tracked, of course, in a Git repository. – Greg d'Eon Jun 20 '16 at 15:39
  • 7
    This is about like asking "How could a Xerox Printer print the schematics to itself?" Compilers compile text to byte code. If the compiler can compile to any usable byte code, you could write the compiler code in the respective language and then pass the code through the compiler to generate the output. – RLH Jun 21 '16 at 14:11
  • Re: self-compilation, the GHC compiler (Haskell) for instance is not able to compile itself, as during compilation, the code is eventually turned into C, and therefore it requires GCC to produce machine code. – sleblanc Jun 23 '16 at 01:24
  • 3
    @AlexD Yes, you can use a hammer to make *another* hammer, but you cannot use a hammer to make itself! It works the same with compilers! A compiler cannot compile itself! – pabrams Jun 23 '16 at 21:12
  • Bootstrapping: http://stackoverflow.com/questions/193560/writing-a-compiler-in-its-own-language – Luciano Jul 07 '16 at 12:50
  • Does this answer your question? [Writing a compiler in its own language](https://stackoverflow.com/questions/193560/writing-a-compiler-in-its-own-language) – Josh Correia Apr 26 '21 at 18:44

10 Answers10

236

The first edition of a compiler can't be machine-generated from a programming language specific to it; your confusion is understandable. A later version of the compiler with more language features (with source rewritten in the first version of the new language) could be built by the first compiler. That version could then compile the next compiler, and so on. Here's an example:

  1. The first CoffeeScript compiler is written in Ruby, producing version 1 of CoffeeScript
  2. The source code of the CS compiler is rewritten in CoffeeScript 1
  3. The original CS compiler compiles the new code (written in CS 1) into version 2 of the compiler
  4. Changes are made to the compiler source code to add new language features
  5. The second CS compiler (the first one written in CS) compiles the revised new source code into version 3 of the compiler
  6. Repeat steps 4 and 5 for each iteration

Note: I'm not sure exactly how CoffeeScript versions are numbered, that was just an example.

This process is usually called bootstrapping. Another example of a bootstrapping compiler is rustc, the compiler for the Rust language.

nbro
  • 15,395
  • 32
  • 113
  • 196
Ben N
  • 2,883
  • 4
  • 26
  • 49
  • 6
    The other route for bootstrapping a compiler is to write an interpreter for (a subset) of your language. – Aron Jun 22 '16 at 08:51
  • As one more alternative to bootstrapping with a compiler or interpreter written in another language, the very old-school route would be to hand-assemble the compiler source. Chuck Moore runs through how to do this for a Forth interpreter in chapter 9, "Programs that bootstrap", at the end of *Programming a Problem-Oriented Language* (https://web.archive.org/web/20160327044521/www.colorforth.com/POL.htm), based on having done it twice before by hand. Code entry here is done via a front panel that allows direct storage of values to memory addresses controlled by toggle switches for bits. – Jeremy W. Sherman May 09 '17 at 14:32
61

In the paper Reflections on Trusting Trust, Ken Thompson, one of the originators of Unix, writes a fascinating (and easily readable) overview of how the C compiler compiles itself. Similar concepts can be applied to CoffeeScript or any other language.

The idea of a compiler that compiles its own code is vaguely similar to a quine: source code that, when executed, produces as output the original source code. Here is one example of a CoffeeScript quine. Thompson gave this example of a C quine:

char s[] = {
    '\t',
    '0',
    '\n',
    '}',
    ';',
    '\n',
    '\n',
    '/',
    '*',
    '\n',
    … 213 lines omitted …
    0
};

/*
 * The string s is a representation of the body
 * of this program from '0'
 * to the end.
 */

main()
{
    int i;

    printf("char\ts[] = {\n");
    for(i = 0; s[i]; i++)
        printf("\t%d,\n", s[i]);
    printf("%s", s);
}

Next, you might wonder how the compiler is taught that an escape sequence like '\n' represents ASCII code 10. The answer is that somewhere in the C compiler, there is a routine that interprets character literals, containing some conditions like this to recognize backslash sequences:

…
c = next();
if (c != '\\') return c;        /* A normal character */
c = next();
if (c == '\\') return '\\';     /* Two backslashes in the code means one backslash */
if (c == 'r')  return '\r';     /* '\r' is a carriage return */
…

So, we can add one condition to the code above…

if (c == 'n')  return 10;       /* '\n' is a newline */

… to produce a compiler that knows that '\n' represents ASCII 10. Interestingly, that compiler, and all subsequent compilers compiled by it, "know" that mapping, so in the next generation of the source code, you can change that last line into

if (c == 'n')  return '\n';

… and it will do the right thing! The 10 comes from the compiler, and no longer needs to be explicitly defined in the compiler's source code.1

That is one example of a C language feature that was implemented in C code. Now, repeat that process for every single language feature, and you have a "self-hosting" compiler: a C compiler that is written in C.


1 The plot twist described in the paper is that since the compiler can be "taught" facts like this, it can also be mis-taught to generate trojaned executables in a way that is difficult to detect, and such an act of sabotage can persist in all compilers produced by the tainted compiler.

200_success
  • 7,286
  • 1
  • 43
  • 74
  • 8
    While this is an interesting bit of information, I don't think it answers the question. Your examples assume you already have a bootstrapped compiler, or else in which language is the C compiler written? – Arturo Torres Sánchez Jun 20 '16 at 05:10
  • 10
    @ArturoTorresSánchez Different explanations work well for different people. I'm not aiming to reiterate what has been said in other answers. Rather, I find the other answers speak at a higher level than how I like to think. I personally prefer a concrete illustration of how one single feature is added, and letting the reader extrapolate from that, instead of a shallow overview. – 200_success Jun 20 '16 at 06:00
  • 5
    OK, I understand your perspective. It's just that the question is more “how can a compiler compile itself if the compiler to compile the compiler doesn't exist” and less “how to add new features to a bootstrapped compiler”. – Arturo Torres Sánchez Jun 20 '16 at 15:03
  • Related reading: **[Is Ken Thompson's compiler hack still a threat?](http://programmers.stackexchange.com/q/184874/)** –  Jun 20 '16 at 15:54
  • I agree with Arturo, the question is "how can a compiler compile itself before it exists", and the answer is "it can't", and the problem lies in OP's interpretation of the sentence "The CoffeeScript compiler is itself written in CoffeeScript", which does not translate to "The Coffeescript compiler compiled itself". – pabrams Jun 20 '16 at 20:06
  • This does not answer the question, which is more along the lines of how can a compiler for X be written in X. – Thorbjørn Ravn Andersen Jun 20 '16 at 20:24
  • 17
    The question itself is ambiguous and open-ended. It appears that some people interpret it to mean "how can a CoffeeScript compiler compile itself?". The flippant response, as given in a comment, is "why shouldn't it be able to compile itself, just like it compiles any code?" I interpret it to mean "how can a self-hosting compiler come into existence?", and have given an illustration of how a compiler can be taught about one of its own language features. It answers the question in a different way, by providing a low-level illustration of how it is implemented. – 200_success Jun 20 '16 at 20:32
  • 1
    @ArturoTorresSánchez: "[I]n which language is the C compiler written?" Long ago I maintained the original C compiler noted in the old K&R appendix (the one for IBM 360.) Many people know that first there was BCPL, then B, and that C was an improved version of B. In fact, there were many parts of that old compiler that were still written in B, and had never been rewritten to C. The variables were of the form single letter/digit, pointer arithmetic wasn't assumed to be automatically scaled, etc. That old code testified to the bootstrapping from B to C. The first "C" compiler was written in B. – Eliyahu Skoczylas Jun 26 '16 at 10:18
29

You have already gotten a very good answer, however I want to offer you a different perspective, that will hopefully be enlightening to you. Let's first establish two facts that we can both agree on:

  1. The CoffeeScript compiler is a program which can compile programs written in CoffeeScript.
  2. The CoffeeScript compiler is a program written in CoffeeScript.

I'm sure you can agree that both #1 and #2 are true. Now, look at the two statements. Do you see now that it is completely normal for the CoffeeScript compiler to be able to compile the CoffeeScript compiler?

The compiler doesn't care what it compiles. As long as it's a program written in CoffeeScript, it can compile it. And the CoffeeScript compiler itself just happens to be such a program. The CoffeeScript compiler doesn't care that it's the CoffeeScript compiler itself it is compiling. All it sees is some CoffeeScript code. Period.

How can a compiler compile itself, or what does this statement mean?

Yes, that's exactly what that statement means, and I hope you can see now how that statement is true.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • 2
    I don't know much about coffee script but you could clarify point 2 by stating that it WAS written in coffee script but was since compiled and is then machine code. And anyhow, could you please explain the chicken and egg problem then. If the compiler was written in a language that a compiler had not yet been written for, then how can the compiler even run or be compiled? – barlop Jun 19 '16 at 09:47
  • 7
    Your statement 2 is incomplete/ inaccurate and very misleading . since as the first answer says, the first was not written in coffee script.. That is so relevant to his question. And as to "How can a compiler compile itself, or what does this statement mean?" You say "Yes" I suppose so(though my mind's a bit small), I see it's used to compile earlier versions of itself, rather than itself. But is it used to compile itself also? I supposed it'd be pointless to. – barlop Jun 19 '16 at 10:10
  • 2
    @barlop: Change statement 2 to "*Today*, the CoffeeScript compiler is a program written in CoffeeScript." Does that help you understand it better? A compiler is "just" a program that translates an input (code) into an output (program). So if you have a compiler for language Foo, then write the source code for a Foo-compiler in the language Foo itself, and feed that source to your first Foo-compiler, you get a second Foo-compiler as output. This done by a lot of languages (for example, all C compilers I know of are written in… C). – DarkDust Jun 19 '16 at 15:49
  • 3
    The compiler can't compile itself. The output file is not the same instance as the compiler that produces the output file. I hope you can see now how that statement is false. – pabrams Jun 20 '16 at 20:19
  • From a novice point of you, your description makes me think of the "which came first, the chicken or the egg" problem. – Mohamad Jun 21 '16 at 13:03
  • 4
    @pabrams Why do you assume that? The output could well be identical to the compiler used to produce it. For instance, if I compile GCC 6.1 with GCC 6.1, I get a version of GCC 6.1 compiled with GCC 6.1. And then if I use that to compile GCC 6.1, I also get a version of GCC 6.1 compiled with GCC 6.1, which should be identical (ignoring things like timestamps). – user253751 Jun 21 '16 at 22:25
  • 1
    @pabrams You are mistaken. Part of the process of porting *gcc* is to compare the outputs of the cross-compiler and the compiled compiler to ensure they are identical. – user207421 Jun 23 '16 at 01:24
  • 1
    @EJP Guys, I didn't say that the output is not identical, I said it's not the same instance. I'm not mistaken. – pabrams Jun 23 '16 at 20:55
  • 1
    @user20574 Think of it this way. A hammer can be used to make another hammer, but it cannot be used to make itself. It works the same with compilers. A compiler cannot compile itself! – pabrams Jun 23 '16 at 21:13
  • @pabrams So two identical executables may not be considered the same software? If I run `cp /usr/bin/gcc ~/gcc2`, then are `/usr/bin/gcc` and `~/gcc2` the same compiler? – user253751 Jun 23 '16 at 21:21
  • 1
    @user20574 That's not what I said. The second compiler is not "itself". A compiler that compiles its own source code into another compiler cannot be said to have compiled itself. A compiler cannot compile itself. It can however compile its own source code. I can't make it any more clear. (As for your second question, I ignored it because the wording you used is problematic. Those paths are not compilers.) – pabrams Jun 23 '16 at 22:19
  • @pabrams Is the source code for a compiler the compiler? In general, is the source code for a program the program? – user253751 Jun 23 '16 at 22:48
  • 1
    @user20574 No, the source code is not the compiler. Strictly speaking, the executable program is the binary that gets executed, not the source code. The word program is a little more ambiguous than the word compiler, so there's more leeway if you don't specify executable program. The word compiler, however, implies that it can compile things, and only the binary can do that. – pabrams Jun 24 '16 at 00:00
  • @pabrams What if the compiler is written in Python? Then there is no compiler? Or /usr/bin/python is the compiler? – user253751 Jun 24 '16 at 00:18
  • @user20574 Since python isn't a compiled language, I don't think that's relevant. You didn't even say what language you're compiling, but I'm pretty sure it still wouldn't be able to compile itself, unless you accept the null compile. I think you're just being silly. – pabrams Jun 24 '16 at 01:59
  • @pabrams The RPython compiler is written in RPython. It also happens to run in standard Python. If I were to run `python rpython.py some_rpython_program.rpy`, what is the compiler here? Is it `python`, or `rpython.py`? (And obviously, when I say that, I mean the files `python` and `rpython.py` refer to, not the filenames themselves) – user253751 Jun 24 '16 at 02:29
  • 1
    Why are you even arguing? Compilers don't compile themselves. Their input is source code, not compiled code, and even if it was I don't think a running file could use itself as input in any operating system. Even if there is some counterexample or some altered definition of a 'compiler' for which this might be true, in general, it's still true that a compiler cannot compile itself, and certainly not in coffeescript. – pabrams Jun 24 '16 at 12:37
  • @pabrams I see you declined to respond to my previous question. Regardless of that, why do we say "I compiled the program" and not "I compiled the source form of the program"? – user253751 Jun 28 '16 at 05:11
  • 1
    @immibis The word program is not the same as the word compiler, and yes, I already said that. My argument is that "a compiler cannot compile itself" - where in there do you see the word program? A compiler compiles things - by definition that's what a compiler does - and source code cannot compile things. Therefore you can't say that a compiler compiles itself when all it can do is compile its own source code. "Itself" refers to the compiler that compiles things, not the source code of the compiler which cannot compile things. – pabrams Jun 28 '16 at 14:33
  • @pabrams "A compiler can compile itself" has a definite and **correct** meaning. It means that a compiler (in an executable form) can take the source code specifying the rules that define it (a.k.a. itself) as input, and output the equivalent rule specifications in another format (compile). Asserting that the "itself" here has the extremely narrow definition of referring only to the executable bits being actively used as instructions for the interpreter (a.k.a cpu or vm) is incorrect and potentially harmful to people coming here hoping to be able to communicate with more than one other person. – 8bittree Jun 28 '16 at 17:34
  • 1
    I disagree. I think it's harmful to say a compiler can compile itself, since it's not true. Here is a publicly accepted definition of compiler: " A compiler is a computer program (or a set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter often having a binary form known as object code." https://en.wikipedia.org/wiki/Compiler -- A compiler cannot compile itself, because that presumes that 'itself' includes its own source code, which does not fit the definition of compiler – pabrams Jun 28 '16 at 18:15
  • @8bittree (continued): Your definition of compiler says that a compiler's source code *IS* the compiler, but a compiler's source code cannot "transform source code into the target language". Therefore, your definition of compiler is wrong. A compiler is *not* its source code. A compiler *is* its byte code. – pabrams Jun 28 '16 at 18:26
  • @pabrams A compiler's source code is the definition of the compiler. ["For the purpose of clarity ‘source code’ is taken to mean any fully executable description of a software system."](https://en.wikipedia.org/wiki/Source_code). "Is" is a variant of the verb "be", one definition of which is ["3. having the state, quality, identity, nature, role, etc., specified."](https://www.google.com/search?hl=en&site=webhp&source=hp&q=define+be). Thus yes, I say the compiler's source code is the compiler, because the source code **is** the compiler in the context of "a compiler compiling itself." – 8bittree Jun 28 '16 at 18:40
  • @pabrams By your definition *of the word "program"* (that says only machine code programs are programs) most compilers cannot compile any program. – user253751 Jun 28 '16 at 20:21
  • @pabrams So if the compiler is written in RPython, and I run it in the normal Python interpreter, then because the compiler's source code is not a compiler, then either I cannot compile anything this way, or the Python interpreter is a compiler? – user253751 Jun 28 '16 at 20:22
  • @immibis For the umpteenth time, I have not provided a definition of program. Go join the chat we started in the next answer. – pabrams Jun 28 '16 at 20:26
9

How can a compiler compile itself, or what does this statement mean?

It means exactly that. First of all, some things to consider. There are four objects we need to look at:

  • The source code of any arbitrary CoffeScript program
  • The (generated) assembly of any arbitrary CoffeScript program
  • The source code of the CoffeScript compiler
  • The (generated) assembly of the CoffeScript compiler

Now, it should be obvious that you can use the generated assembly - the executable - of the CoffeScript compiler to compile any arbitrary CoffeScript program, and generate the assembly for that program.

Now, the CoffeScript compiler itself is just an arbitrary CoffeScript program, and thus, it can be compiled by the CoffeScript compiler.

It seems that your confusion stems from the fact that when you create your own new language, you don't have a compiler yet you can use to compile your compiler. This surely looks like an chicken-egg problem, right?

Introduce the process called bootstrapping.

  1. You write a compiler in an already existing language (in case of CoffeScript, the original compiler was written in Ruby) that can compile a subset of the new language
  2. You write a compiler that can compile a subset of the new language in the new language itself. You can only use language features the compiler from the step above can compile.
  3. You use the compiler from step 1 to compile the compiler from step 2. This leaves you with an assembly that was originally written in a subset of the new language, and that is able to compile a subset of the new language.

Now you need to add new features. Say you have only implemented while-loops, but also want for-loops. This isn't a problem, since you can rewrite any for-loop in such a way that it is a while-loop. This means you can only use while-loops in the source code of your compiler, since the assembly you have at hand can only compile those. But you can create functions inside your compiler that can pase and compile for-loops with it. Then you use the assembly you already have, and compile the new compiler version. And now you have an assembly of an compiler that can also parse and compile for-loops! You can now go back to the source file of your compiler, and rewrite any while-loops you don't want into for-loops.

Rinse and repeat until all language features that are desired can be compiled with the compiler.

while and for obviously were only examples, but this works for any new language feature you want. And then you are in the situation CoffeScript is in now: The compiler compiles itself.

There is much literature out there. Reflections on Trusting Trust is a classic everyone interested in that topic should read at least once.

Polygnome
  • 7,639
  • 2
  • 37
  • 57
  • Soo.... a compiler *cannot* compile itself, right? That's why bootstrapping is needed. And therefore the sentence you quoted does *not* mean "exactly that"...? Unless when you say "it" you're referring to the sentence that you did not quote instead of the one you did quote. – pabrams Jun 20 '16 at 20:09
  • 5
    (The sentence "The CoffeeScript compiler is itself written in CoffeeScript", is true, but "A compiler can compile itself" is false.) – pabrams Jun 20 '16 at 20:15
  • 5
    No, its completely true. The compiler *can* compile itself. It just doesn't make sense. Say you have the executable that can compile Version X of the language. You write a compiler that can compile Version X+1, and compile it with the compiler you have (which is version X). You end up with an executable that can compile version X+1 of the language. Now you could go and use that new executable to re-compile the compiler. But to what end? You already *have* the executable that does what you want to. The compiler can compile *any* valid program, so it completely can compile itself! – Polygnome Jun 21 '16 at 11:04
  • "The compiler can compile itself. It just doesn't make sense." I disagree, firstly the initial build with the "starting compiler" may build with reduced functionality or optimisation. Secondly the "starting compiler" may have known bugs that could cause defects in the new compiler. Thirdly compiling the compiler is a good "smoke test". – plugwash Jun 23 '16 at 19:45
  • 1
    Indeed it's not unheard of to build quite a few times, iirc modern freepascal builds the compiler a total of 5 times. – plugwash Jun 23 '16 at 19:47
  • @plugwash The point is that when you fix some bug, you change the source code and you don't compile the compiler itself, but a future version of the compiler. I think at least thats what pabrams was trying to say. I absolutely agree on your last point, though. – Polygnome Jun 23 '16 at 20:36
  • @Polygnome In your anecdote, the compiler never compiles itself. It's always compiling a different instance of the compiler, never its own instance. In order for you to run the compiler, it must already be compiled. – pabrams Jun 23 '16 at 20:57
  • That is true. But still, it *can* compile itself and this might actually be done some times, e.g. to verify it (compile future version, use future version to compile itself, compare binaries). – Polygnome Jun 23 '16 at 21:12
  • @Polygnome but the future version is not *itself*! Think of it this way, you can use a hammer to make another identical hammer, but you can't use a hammer to make itself. Or think of it this way: the compiler does not exist until it's compiled. You can't recompile it when it's already compiled. If you recompile the source code, you produce a different instance of the same compiler. – pabrams Jun 23 '16 at 21:16
  • A hammer is a physical object. This doesn't apply to digital ones. Lets say source code is dentoted by letters, and an executable is letter+X. The first version of the compilers source code in the foreign language is A, the executable is AX. You then write the first version of the compiler in its own language, which is B. You use AX to compile B, yielding BX. Now you can use BX to compile B again, which yields BX. Using BX to compile B will always again yield BX, thus, the compiler compiles itself. works for new versions, too. – Polygnome Jun 23 '16 at 21:36
  • We don't usully differentiate the source code and the generatedexecutable. A compiler can compile *any* arbitrary program, so it *can* compile itself. If you want to split hairs, you could argue that the compilers source code and its executable are different things, and that the proper phrasing would be "The compiled compiler (= the generated compiler) can compile the source code of the compiler". But that is bollocks and not how things are usually phrased. – Polygnome Jun 23 '16 at 21:39
  • @Polygnome We *should* distinguish between the source code and the compiled code. You might say I'm splitting hairs, but to say a compiler can compile itself is incorrect. A compiler compiles source code, but a compiler is a binary file, therefore by definition it cannot compile itself. You can say that by convention we speak this way, but not everyone knows the convention. We should use our words in order to be understood by everyone as much as possible, not just by programmers. I think it's bollocks that you say it's bollocks - that *is* how things are correctly phrased. – pabrams Jun 23 '16 at 22:30
  • @Polygnome You don't have to say "The compiled compiler" - a compiler is already assumed to be compiled. That's why it can't compile itself! If you say a compiler can compile itself, then you're not using your words properly. – pabrams Jun 23 '16 at 22:33
  • For all intents and purposes, everyone who is able to write programs and knows how compilations works will understand the statement "The compiler can compile itself". The rest is just splitting hairs for splitting hairs sake. In an compiled language, you can *obviously* not directly execute the source code. – Polygnome Jun 24 '16 at 05:52
  • Btw, no, "the compiler" is *not* the generated executable. If I ask someone to give me a link to "program X" then I'm fine with getting a link to a repo with source code and all tools needed to build it. If you want to talk about the executable, you should talk about a "build of the compiler". So if you really insist of splitting hairs, the statement would be "A build of the compiler can compile its source", which again, from nearly every reasonable point of view, is imho equivalent to "the compiler can compile itself". – Polygnome Jun 24 '16 at 05:53
  • @Polygnome You can continue to confuse non-programmers if you want by making statements that are literally false. I always try to say things that are true. A compiler cannot compile itself. Frankly, it's ridiculous that you're arguing with me. Why not just change your words so that the statement is true instead of insisting it's okay because most people know what you mean? – pabrams Jun 24 '16 at 12:25
  • @Polygnome Yes, the compiler *is* the executable. A compiler compiles things; that's what the word means. Only the built compiler can compile things. The source code cannot compile anything, therefore it cannot be referred to by itself as a compiler. In your example you changed the word from "compiler" to "program" which is not the same thing - the word program is far more ambiguous. – pabrams Jun 24 '16 at 13:12
  • 1
    @pabrams Writing "Do not touch" and "Hot object. Do not touch" makes no difference to the intended message of the phrase. As long as the intended audience of the message (Programmers) understand the intended message of the phrase (A build of the compiler can compile its source) regardless of how it is written, this discussion is pointless. As it stands now, your argument is invalid. Unless you are able to show that the intended audience of the message is non-programmers, then, and only then, you are correct. – DarkDestry Jun 24 '16 at 23:27
  • @DarkDestry Of course it's valid for programmers. Are you saying programmers shouldn't use proper English? You are being ridiculous.As for "do not touch" and "hot object", I have no idea what you're babbling about. If what you mean is that a build of the compiler can compile its source, then just say so! If you say a compiler can compile "itself", you're just wrong, and arguing that it's a true statement is what is pointless. A compiler can compile things. A compiler's source code CANNOT compile things. What's so difficult for you to understand about that? – pabrams Jun 25 '16 at 13:08
  • 2
    @pabrams 'Good english' is english that communicates ideas clearly to the intended audience, and in the way that the writer or speaker intended. If the intended audience is programmers, and programmers understand it, its good english. Saying "Light exists as both particles and waves" is fundamentally equivalent to "Light exists as both photons and electromagnetic waves". To a physicist, they mean literally the same thing. Does that mean we should always use the longer and clearer sentance? No! Because it complicates reading when the meaning is already clear to the intended audience. – DarkDestry Jun 25 '16 at 14:34
  • Maybe good English was the wrong way for me to put it. We should just avoid saying things that are literally false, such as "the compiler can compile itself". The phrase "the compiler can compile its own code" is really not much longer, and is not false like the first. Not all programmers even understand exactly what a compiler is. A question was asked about what the phrase meant, and some of the answers were "exactly what it says", which is false. This is not about particles and waves, it's about using your words properly. You want to sacrifice truth to save a few characters. Whatever. – pabrams Jun 26 '16 at 15:58
  • 1
    @pabrams The question asked was what does the quoted statement mean. The quoted statement is in all ways, true and exactly what it means. (Side note: The reason the quoted statement was misunderstood is because the writer of that statement decided to use "Better english" and obscured the meaning of that statement. If he says 'The coffeescript compiler is written in coffeescript' (Note how the itself is completely unnecessary), the question would instead become "How did the first compiler compile without a coffeescript compiler?") – DarkDestry Jun 26 '16 at 16:32
  • Yes, the itself is unnecessary. Yes, there's nothing wrong with saying "the coffescript compiler is written in coffescript". The false statement is "a compiler can compile itself" as I've repeatefly stated. Its rhe answer i had a problem with , you know, the one we are commentING on, where he quotes the wrong sentence and says it means exactly that....By Jove, I think you're finally starting to get it! – pabrams Jun 27 '16 at 17:18
  • 1
    @pabrams By your definition, most compilers cannot compile any program. – user253751 Jun 28 '16 at 05:26
  • @immibis I didn't even use the word program in my definition. I don't know why you're so hung up on that word, but that's probably why you're not understanding. – pabrams Jun 28 '16 at 14:34
  • @pabrams Wait... I need to clarify this. By your definition, can a compiler compile a program? – DarkDestry Jun 28 '16 at 17:28
  • @DarkDestry By my definition of what? I've only defined compiler, not program. You define program, and I'll tell you whether or not a compiler can compile it, but you should be able to answer yourself based on existing discussion. I've stated repeatedly that a compiler compiles source code but not byte code. If by 'program' you mean source code, then the answer is yes, of course an appropriate compiler can compile it, assuming that the code is valid. Why are you asking me this? I've clarified my position many times now. It should be obvious. Just read the comments. – pabrams Jun 28 '16 at 18:19
  • @pabrams By my definition "A compiler compiling itself" is interpreted, without second thought, as "A compiler compiling the source code of the compiler" which, is true in all ways. – DarkDestry Jun 28 '16 at 18:33
  • @DarkDestry Then you need to give it a second thought: A compiler's source code cannot compile anything, therefore a compiler's source code is not a compiler. – pabrams Jun 28 '16 at 18:38
  • 1
    @pabrams If you want to put this into a full sentence "A compiler compiling itself" is interpreted as "A compiler (program) compiling (the source code of) itself". The words in the brackets are intuitive. – DarkDestry Jun 28 '16 at 18:41
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115872/discussion-between-pabrams-and-darkdestry). – pabrams Jun 28 '16 at 18:41
7

A small but important clarification

Here the term compiler glosses over the fact that there are two files involved. One is an executable which takes as input files written in CoffeScript and produces as its output file another executable, a linkable object file, or a shared library. The other is a CoffeeScript source file which just happens to describe the procedure for compiling CoffeeScript.

You apply the first file to the second, producing a third which is capable of performing the same act of compilation as the first (possibly more, if the second file defines features not implemented by the first), and so may replace the first if you so desire.

nbro
  • 15,395
  • 32
  • 113
  • 196
5
  1. The CoffeeScript compiler was first written in Ruby.
  2. The CoffeeScript compiler was then re-written in CoffeeScript.

Since the Ruby version of the CoffeeScript compiler already existed, it was used to create the CoffeeScript version of the CoffeeScript compiler.

enter image description here This is known as a self-hosting compiler.

It's extremely common, and usually results from an author's desire to use their own language to maintain that language's growth.

nbro
  • 15,395
  • 32
  • 113
  • 196
Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
3

It's not a matter of compilers here, but a matter of expressiveness of the language, since a compiler is just a program written in some language.

When we say that "a language is written/implemented" we actually mean that a compiler or interpreter for that language is implemented. There are programming languages in which you can write programs that implement the language (are compilers/interpreters for the same language). These languages are called universal languages.

In order to be able to understand this, think about a metal lathe. It is a tool used to shape metal. It is possible, using just that tool, to create another, identical tool, by creating its parts. Thus, that tool is a universal machine. Of course, the first one was created using other means (other tools), and was probably of lower quality. But the first one was used to build new ones with higher precision.

A 3D printer is almost a universal machine. You can print the whole 3D printer using a 3D printer (you can't build the tip that melts the plastic).

Paul92
  • 8,827
  • 1
  • 23
  • 37
  • I like the lathe analogy. Unlike the lathe analogy, though, imperfections in the first compiler iteration are passed along to all subsequent compilers. For example, an above answer mentions adding a for-loop feature where the original compiler only uses while loops. The output understands for-loops, but the implementation is with while loops. If the original while loop implementation is flawed or inefficient, then it always will be! –  Jun 21 '16 at 14:58
  • @Physics-Compute that is simply wrong. In the absence of malice defects don't usually propagate when compiling a compiler. – plugwash Jun 23 '16 at 19:54
  • Assembly translations certainly do get passed from iteration to iteration until the assembly translation is fixed. New features that build off old features do not change the underlying implementation. Think about it for a while. –  Jun 23 '16 at 23:42
  • @plugwash See "Reflections on Trusting Trust" by Ken Thompson - https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf –  Mar 30 '17 at 16:19
3

Proof by induction

Inductive step

The n+1th version of the compiler is written in X.

Thus it can be compiled by the nth version of the compiler (also written in X).

Base case

But the first version of the compiler written in X must be compiled by a compiler for X that is written in a language other than X. This step is called bootstrapping the compiler.

Guy Argo
  • 397
  • 4
  • 10
  • 1
    The very first compiler compiler for language X can easily be written in X. How that is possible is that this first compiler can be **interpreted**. (By an X interpreter written in a language other than X). – Kaz Jun 23 '16 at 00:21
1

While other answers cover all the main points, I feel it would be remiss not to include what may be the most impressive example known of a compiler which was bootstrapped from its own source code.

Decades ago, a man named Doug McIlroy wanted to build a compiler for a new language called TMG. Using paper and pen, he wrote out source code for a simple TMG compiler... in the TMG language itself.

Now, if only he had a TMG interpreter, he could use it to run his TMG compiler on its own source code, and then he would have a runnable, machine-language version of it. But... he did have a TMG interpreter already! It was a slow one, but since the input was small, it would be fast enough.

Doug ran the source code on that paper on the TMG interpreter behind his eye sockets, feeding it the very same source as its input file. As the compiler worked, he could see the tokens being read from the input file, the call stack growing and shrinking as it entered and exited subprocedures, the symbol table growing... and when the compiler started emitting assembly language statements to its "output file", Doug picked up his pen and wrote them down on another piece of paper.

After the compiler finished execution and exited successfully, Doug brought the resulting hand-written assembly listings to a computer terminal, typed them in, and his assembler converted them into a working compiler binary.

So this is another practical (???) way to "use a compiler to compile itself": Have a working language implementation in hardware, even if the "hardware" is wet and squishy and powered by peanut butter sandwiches!

Alex D
  • 29,755
  • 7
  • 80
  • 126
0

Compilers take a high-level specification and turn it into a low-level implementation, such as can be executed on hardware. Therefore there is no relationship between the format of the specification and the actual execution besides the semantics of the language being targeted.

Cross-compilers move from one system to another system, cross-language compilers compile one language specification into another language specification.

Basically compiling is a just translation, and the level is usually higher-level of language to lower-level of language, but there are many variants.

Bootstrapping compilers are the most confusing, of course, because they compile the language they are written in. Don't forget the initial step in bootstrapping which requires at least a minimal existing version that is executable. Many bootstrapped compilers work on the minimal features of a programming language first and add additional complex language features going forward as long as the new feature can be expressed using the previous features. If that were not the case it would require to have that part of the "compiler" be developed in another language beforehand.

nbro
  • 15,395
  • 32
  • 113
  • 196