28

I was wondering if today's modern compilers like MS cc, gcc, clang, icc, newer versions were built with the current version of the same compiler?

Because of course of this risk:
http://scienceblogs.com/goodmath/2007/04/15/strange-loops-dennis-ritchie-a/
http://c2.com/cgi/wiki?TheKenThompsonHack

I'm sure everyone involved with the afore-mentioned compilers' development knows about this issue, whereby code is injected into the compiler by an earlier version of itself and propagates invisibly.

Now the real problem, is not really one of backdoors, but much more about code generation correctness isn't it ? How about if somewhere in the build chain some pervert twist was introduced by pure mistake, and today's compiler generate incorrect code, even if the compiler's source look OK, because of the Ken Thompson's flaw?

So if they are built with themselves, how do they protect themselves?

david.pfx
  • 10,520
  • 3
  • 30
  • 63
v.oddou
  • 6,476
  • 3
  • 32
  • 63
  • Compilers today are maintained not by single persons, but by quite sizeable teams. Peer review is the operational term here. Anyway, voting for close, "primarily opinion based". – DevSolar Mar 27 '14 at 08:00
  • Not only is there peer-reviews, there are quite extensive tests for each compiler to go though before being released. Though, even in trivial applications bugs happen, in a non-trivial like modern C or a C++ compilers there is a 100% chance of bugs being in the released versions. – Some programmer dude Mar 27 '14 at 08:01
  • 1
    http://stackoverflow.com/questions/193560/implementing-a-compiler-in-itself http://stackoverflow.com/questions/494372/how-can-the-linux-kernel-compile-itself http://stackoverflow.com/questions/5657454/is-gcc-c-compiler-written-in-c-itself – stijn Mar 27 '14 at 08:03
  • 4
    @DevSolar: `opinion based?` seriously ? and what do you say about perfectly verifiable EJP's comment ? and why don't you give a chance to implementors of answering ? There ARE microsoft's people in this community. I'm sure there also are GCC contributors. – v.oddou Mar 27 '14 at 08:20
  • And what would you like them to say? The answer to your first question - are compilers built with their own binaries - is "yes", as any kind of googling or experience with building a system would have told you even without stijn's help. The second part of your question *is* opinion-based, as it's pretty hard to prove the non-existence of something, especially something like the Thompson Backdoor. As for code generation, what do you think regression tests are for? – DevSolar Mar 27 '14 at 08:37
  • As far as it goes, compilers are not always built with themselves; for example, I've built GCC using the SUNWspro compiler. This would not prevent a backdoor in the code, but it would have to be in the *current* code. – user3303729 Mar 27 '14 at 08:43
  • 1
    Possibly. What about: are conveyor belt systems build in factories using a conveyor belt? Are robots built using robots (or are they made by hand)? Are computers designed using computers (or drawn on paper)? – Patrick Mar 27 '14 at 09:03
  • 4
    To get some confidence in the compiler, you would (a) compile it with multiple compilers from different manufacturers; this will obviously produce different binaries, then (b) compile it with these different binaries, which then should give identical binaries. That's for a compiler that can compile the language it's written in; COBOL compilers might not be written in COBOL. – gnasher729 Mar 27 '14 at 09:08
  • 2
    http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29 – phuclv Mar 27 '14 at 09:23
  • http://en.wikipedia.org/wiki/Self-hosting – phuclv Mar 27 '14 at 09:24
  • Even compiler writers use source control. – Hans Passant Mar 27 '14 at 15:13
  • I'm not going to vote to close any question that Eric Lippert is willing to answer. And it really isn't an exact duplicate. – david.pfx Jul 21 '14 at 13:48
  • @gnasher729 They sure aren't. – user207421 Jul 22 '14 at 04:45
  • @david.pfx: thanks its a nice edit, that resets the question into a "one focus" question, with the title on the main focus. – v.oddou Jul 22 '14 at 05:40

2 Answers2

25

I was wondering if today's modern compilers like MS cc, gcc, clang, icc, newer versions were built with the current version of the same compiler?

The Roslyn C# compiler can build itself; in fact, it is one of its own best test cases. Of course it could not do so on day one or even day 100; it was built with the previous version of the C# compiler, which was written in C++.

How about if somewhere in the build chain some pervert twist was introduced by pure mistake, and today's compiler generate incorrect code, even if the compiler's source look OK

This is a serious concern.

One of the interesting ways you can look for a bug in a self-building compiler is as follows: call the original not-self-building compiler Alpha. Build the new source code with Alpha to produce Beta. Then have Beta build the source code to produce Gamma. Then have Gamma build the source code to produce Delta. If there are significant differences in the binaries produced for Gamma and Delta, you likely have a problem. Beta and Gamma should have the same outputs given the same inputs. (C# in particular does not promise that compiling the same code twice produces exactly the same binary, so you've got to be careful to make sure that your test is sophisticated enough to take that into account.)

The way you mitigate this risk is of course the same way you mitigate any risk associated with bad tools: you check in various versions of the compiler tools into the repository, so that you can roll back to a previous known-good version of the compiler should you need to. And you test the compiler heavily.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • What puzzles me in bootstrap and also Diverse Double Compiling is that I cannot fathom by what miracle should we obtain "identical output" ? Because a new version of a compiler exists because it generates better assembly (more optimized), therefore compiling the compiler with different versions or vendors would invariably result in different outputs, because of varying implementation choices in code generation. proof : http://yosoygames.com.ar/wp/2013/12/visual-studio-unaligned-store-frustration/ – v.oddou Mar 28 '14 at 01:07
  • 3
    @v.oddou: Read my scenario again carefully. Compilers Alpha and Beta need not have the same output; those are two different compilers with different source code and different behaviour. But surely compilers Beta and Gamma should have identical behaviour as they were compiled from the same source code. If the Beta sources compiled with Alpha and the Beta sources compiled with Beta give compilers with different behaviours then something is likely wrong with Beta. – Eric Lippert Mar 28 '14 at 01:14
  • Thanks for help :) Ok, I;ve made this diagram: http://postimg.org/image/p4lqovfwn/ now I would grant you that test's binary 1 and 2 should be equivalent (modulo `__TIME__` macros stuff) but `Orange Beta` and `Blue Beta` would be different binaries, and though input/output symetry should be respected, the time it will take to `Orange` to compile will be different than `Blue`s because of different level of optimization in `Alpha` and `Beta`. So the 2 binaries are actually very different. I still miss something – v.oddou Mar 28 '14 at 02:01
  • I think I'm on the way to understand (that's why I accepted your answer this time), `Gamma` is "second-chained built" therefore it should be equivalent to `Beta` because it is just a rebuild ? Is that provable ? – v.oddou Jul 23 '14 at 02:10
5

In general the answer is 'yes', for compilers implemented in their own languages. Building the compiler with itself is one of the best tests for correctness. Successive runs should keep producing the same binary. 'GC' for example is built with a four-stage bootstrap process.

Of course some languages can't be used for compiler-writing.

EDIT It should be made clear that this answer was posted when the substantive question was "Are compilers built with previous version of themselves?" It has subsequently been changed.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 3
    "Of course some languages can't be used for compiler-writing." This statement is untrue unless we're including really domain-specific languages. Any Turing-complete language with I/O can be used to implement a compiler for any other language. – nibot Mar 27 '14 at 09:19
  • 3
    @nibot In practice that isn't so. It isn't economically feasible to write a COBOL compiler in COBOL, for example, and only a madman would try it. I didn't. Too many things are missing: recursion, for a start. – user207421 Mar 27 '14 at 09:31
  • 2
    @nibot: I challenge you to write a self-hosting PL/SQL compiler, or ABAP... ;-) Let's just say, some languages cannot *reasonably* be used for compiler writing. (Although I've heard of BF-written BF compilers, and BF is a language that cannot be mentioned with "reasonable" in the same sentence without quotation marks. ;-) ) – DevSolar Mar 27 '14 at 10:40
  • @nibot You're also ignoring the issue of the runtime library. You can't write a runtime library for Cobol in Cobol. – user207421 Mar 29 '14 at 00:20