Confusions arising from a programming language whose compiler is written in itself

Question

By accident I knew that the compiler of Haskell is written in Haskell. It sounds strange to me. How is this possible, I mean, to compile itself? Who is to compile the compiler then? What is the ultimate code accepted by machine?

Consider the programming language who is the first to have a compiler. What is the language of its compiler? Going back even farther in time, how did people program before the era of compiler?

Broadly speaking, I am often confused about the border between software (e.g. programming written by people) and hardware (e.g. something executable on a physical machine).

P.S.: I have basic knowledge about compiler such as lexical analysis, parsing, and code optimization. However, I know little about hardware (the machine).

It seems that the answer to a related post Implementing a compiler in “itself” does not go deeply into the border between software and hardware.
And I would like to see some concrete examples.

EDIT: Some comments mentioned the term "bootstrapping". It seems that there is some minimum core part of a language (like axioms/basic theorems in mathematics) which must be compiled in a lower-level way (instead of by itself). What are they? Are they basically the same in different languages? Again, I would like to see some concrete examples.

It's like asking who wrote the program that Lady Lovelace copied from to write her stuff -- such mysteries are best left unsolved. — Hot Licks, Jul 12 '14 at 02:36
PS: Do you have any rational argument that one could not write a C compiler in C? — Hot Licks, Jul 12 '14 at 02:37
There is a theory in philosophy/logic that was proposed by Bertrand Russell and company that a "self-reflexive" argument is invalid. This is intuitive in a way, and it thus makes one feel uneasy when discussing the self-reflexive nature of many compilers. (I personally worked on a Pascal compiler that was compiled using itself.) — Hot Licks, Jul 12 '14 at 02:45
@HotLicks I am not able to argue that one could not write a C compiler in C. However, I am not able to justify that one can do this either. I am confused about the fact that how is a piece of text (written in some programming language) translated and finally executable on a machine. — hengxin, Jul 12 '14 at 02:46
If you want to really blow your mind it has been demonstrated that one can embed malware into a compiler (that corrupts certain operations, eg) in such a way that it wil not be visible in the compiler source, but will still reproduce itself when the compiler is recompiled with itself. — Hot Licks, Jul 12 '14 at 02:47
How a compiler works is both a very simple thing to explain and very complicated to understand. — Hot Licks, Jul 12 '14 at 02:49
Are you confused about what bootstrapping is? Usually one would write a simple compiler in a language like C to compile the core language, just enough to write a new compiler in the language you just wrote. Then you can start writing more of the language _in itself_, compiling the code as you go to add more and more powerful features. This is very similar to the mathematical concepts of building upon previously proven theorems. You may ask "what compiled the C compiler?", and the answer is probably Assembly. What compiled Assembly? Likely punch cards if you go back far enough. — bheklilr, Jul 12 '14 at 02:53
@HotLicks Yes, I think the "self-reflexive" by Bertrand Russell is to the point. So how does a compiler compiled by itself live with it? — hengxin, Jul 12 '14 at 02:54
The process is known as bootstrapping, you use another language to build the basic compiler out until it's capable of compiling itself. It sounds complicated but it's essentially just writing your high-level compiler at a lower-level initially until it's self-hosting. — Stephen Diehl, Jul 12 '14 at 02:54
(To further confuse you, one of the old supercomputers (maybe the Illiac IV) had a "hardwired" compiler -- it was not a program but digital logic on printed circuit boards.) — Hot Licks, Jul 12 '14 at 02:55
possible duplicate of [Implementing a compiler in "itself"](http://stackoverflow.com/questions/193560/implementing-a-compiler-in-itself) — user207421, Jul 12 '14 at 03:00
(I am reminded that the computer with the hardwired compiler was [Symbol](http://www.computer.org/csdl/mags/co/1981/07/01667437.pdf). Unfortunately this computer system is just about impossible to research since any search for "Symbol computer" will get about a million "hits" on pages discussing computer symbols.) — Hot Licks, Jul 12 '14 at 03:05
@bheklilr I only know the term "bootstrapping" in the context of operating systems. Try to understand your "mathematical" idea: there are some **axioms/core theorems** which must be compiled in a low-level way (such as Assembly or even punch cards). If so, would you mind illustrating some "core theorems"? Are they basically the same in different languages? — hengxin, Jul 12 '14 at 03:08
Step 1a: Write a naive compiler for your language in another language. Step 1b: Write an interpreter for your language in another language. Step 1c: Interpret/compile your language *by hand*. Step 2: Write a compiler for your language in your language. Step 3: Use the technique developed in Step 1 to compile the code developed in Step 2. You now have a reasonable compiler for your language in your language. Repeat steps 2 and 3 as you add features, except use the previous version rather than the step 1 version. — Levi Pearson, Jul 12 '14 at 03:10
@LeviPearson How naive it can be? What is the minimum core that must be compiled in another language (as `@bheklilr` said)? Could you please give some concrete examples? — hengxin, Jul 12 '14 at 03:15
@hengxin Bootstrapping is different in this context, I'd recommend reading the wikipedia page on compiler bootstrapping for a complete description. And I would argue that the axioms in mathematics _are_ your low level descriptions. The axioms are descriptions of the core structure of your language, and the different objects you can represent with those axioms are your "language". I'll defer to [this gist](https://gist.github.com/bheklilr/a868ff4b2bb915e0302b) for a longer explanation (please check back in about 20 minutes for me to put something in there). — bheklilr, Jul 12 '14 at 03:17
By "naive" I mean complete enough to run the compiler you wrote, but with no serious attempt to be fast or efficient. It has to support precisely the set of features that you use in the "real" compiler. — Levi Pearson, Jul 12 '14 at 04:25
@bheklilr Another comment there. Maybe GitHub (gist) should provide the notification mechanism. — hengxin, Jul 12 '14 at 07:02
@hengxin They really should implement it, a quick google search tells me that github has yet to put that often requested feature in, though. — bheklilr, Jul 12 '14 at 14:08
A simple example: I once "wrote" a Pascal compiler for IBM AS/400. Actually, what I did was take the VS Pascal compiler source, edit it to generate code for AS/400 instead of S/370, and then compile that through the VS Pascal compiler. The result was a "cross compiler" that would run on S/370 but generate code for AS/400. Then I ran the same source through the "cross compiler" and I had a compiler that would run on AS/400. Finally, I ran the same source through the compiler running on AS/400 and I had a compiler that was compiled on AS/400, ran on AS/400, and produced code for AS/400. — Hot Licks, Jul 12 '14 at 17:26
And if the self-reflexive nature of computer stuff bugs you, consider that, in Java, the class Class is an instance of the class Class. — Hot Licks, Jul 13 '14 at 02:31

score 4 · Answer 1 · answered Jul 12 '14 at 08:27

As you can read in A history of Haskell page 28, the first haskell compiler was written in Lazy ML in June 1989. It implemented essentially all of Haskell 1.0.

Now that this compiler existed, it can then be used to compile the Haskell version of GHC. The first beta of GHC written in Haskell was release on 1 April 1991. The full release came in December 1992.

Because the Lazy ML-based compiler wasn't developed further, today you use a previous version of GHC to compile GHC. So if you want to build GHC 7.8, you use GHC 7.6 to build it (in practice, it's a bit more complicated, because there are multiple stages and only the first stage, which doesn't support GHCi or TemplateHaskell is built with GHC 7.6)

That means that if you don't have a working haskell compiler today, you have two options:

Try to install a LML compiler and compile the first version of GHC written in Lazy ML. Then use this compiler to compile the next version which is written in Haskell. Then again use that compiler to build the next version, and repeat until you have a reasonably recent compiler. It may be possible to skip a few versions, but I don't know how many. As you can imagine, this could take a lot of time.
(Much easier) Download pre-built GHC binaries.

JimC · Answer 2 · 2014-11-10T05:30:43.947

Um... I have not tried this, but another route would be simply compiling to c and using a c compiler to compile latest ghc...ghc is itself built in stages, so you don't really even need to convert the whole code base to c, just the first stage, which then can compile the rest. Certainly no need to dig up Lazy ML.

Edit: Note the resulting compiler will not build binaries targeting the new platform, it would simply run on that platform and be a cross-platform compiler for targets that ghc already has backends for. Another note is that i actually intended this in response to bennofs answer, not as stand alone answer to the OP.

Confusions arising from a programming language whose compiler is written in itself

2 Answers2