4

I was looking into compiler bootstrapping, and I looked at how Golang implements bootstrapping from source, i.e., by building the last version of Golang implemented in C and using the generated executable to compile newer Go releases. This made me curious as to how the same could be done with C. Can you construct a C compiler on a computer with literally nothing present on it? If not, then how can I trust that the binary of the compiler I use doesn't automatically fill the binaries it compiles with spyware?

Related question, since the first C compiler was written in B and B was written in BCPL, what was BCPL written in?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Monke
  • 93
  • 6
  • You can bootstrap using cross-compilation on a different host system. – Some programmer dude Jan 16 '21 at 15:51
  • @Someprogrammerdude But then, I could say that the compiler you're using to cross-compile wasn't compiled by you but by some package maintainer. Even if you did construct your compiler yourself, i.e. by cross-compiling your host system compiler, it would have been compiled with a precompiled binary. – Monke Jan 16 '21 at 16:22
  • 1
    When implementing a compiler for a brand new language in the language itself, one must always use a second language for the very first compiler. If that language is raw binary machine code, assembly, C or something else doesn't matter. This is a very tedious and time-consuming process, but thankfully a process that only needs to be done *once!* Once there is a compiler that can compile itself, you no longer need the first primary bootstrap compiler written in another language. So, why do it if you don't need to? [to be continued...] – Some programmer dude Jan 16 '21 at 17:41
  • 1
    If you want to implement a C compiler using C, then use the tools available to you. And those tools include any possible existing C compiler! With that said, it *is* possible to do it from scratch, if there's any suitable language available, but don't expect it to be done quickly or easily. And your time is probably better spent elsewhere. – Some programmer dude Jan 16 '21 at 17:43

3 Answers3

5

Can you construct a C compiler on a computer with literally nothing present on it?

The main issue is how (in 2021) would you write a program for that computer! And how would you input it?

In the 1970s computers (like IBM 360 mainframes) had many mechanical switches to enter some initial program. In the 1960s, they had even more, e.g. IBM1620.

Today, how would you input that initial program? Did you consider using some Arduino ? Even oscilloscopes today contain microprocessors with programs....

Some hobbyists today have designed (and spent a lot of money) in making - a few years ago - computers with mechanical relays. These are probably thousands times slower than the cheapest laptop computer you could buy (or the micro-controller inside your computer mouse - and your mouse contains some software too).

You could also buy many discrete transistors (e.g. thousands of 2N2222) and make a computer by soldering them.

Even a cheap motherboard (like e.g. MSI A320M A-PRO) has today some firmware program called UEFI or BIOS. It is shipped with that program.... and rumored to be mostly written in C (several dozen of thousands of statements).

In some ways, computer chips are "software" coded in VHDL, SystemC, etc... etc...

However, you can in principle still bootstrap a C compiler in 2021.

Here is an hypothetical tale....

Imagine you have today a laptop running a small Linux distribution on some isolated island (à la Robinson Crusoe), without any Internet connection - but with books (including Modern C and some book about x86-64 assembly and instruction set architecture and many other books in paper form), pencils, papers, food and a lot of time to spend. Imagine that system does not have any C compiler (e.g. because you just removed by mistake the gcc package from some Debian distribution), but just GNU binutils (that is, the linker ld and the assembler gas), some editor in binary form (e.g. GNU emacs or vim), GNU bash and GNU make as binary packages. We assume you are motivated enough to spend months in writing a C compiler. We also assume you have access to man pages in some paper form (notably elf(5) and ld(1)...). We have to assume you can inspect a file in binary form with od(1) and less(1).

Then you could design on paper a subset µC of the C language in EBNF notation. With months of efforts, you can write a small assembler program, directly doing syscalls(2) (see Linux Assembly HowTo) and interpreting that µC language (since writing an interpreter is easier than writing a compiler; read for example the Dragon book, and Queinnec's Lisp In Small Pieces and Scott's programming language pragmatics book).

Once you have your tiny µC interpreter, you can write a naive µC compiler in µC (since Fabrice Bellard has been able to write his tinyC compiler).

Once you have debugged that µC compiler, you can extend it to accept all the syntax and semantics of C.

Once you have a full C compiler, you could improve it to optimize better, maybe extend it to accept a small subset of C++, and you might also write a static C code analyzer inspired by Frama-C.

PS. Bootstrapping can be generalized a lot - see Pitrat's blog on bootstrapping artificial intelligence (Jacques Pitrat, born in 1934, died in october 2019) and the RefPerSys project.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
2

As Some programmer dude stated in a comment, since C is a portable programming language, you can use a compiler for a different platform to produce a cross-compiler that on that platform would produce executables for the target platform.

You then compile that same C compiler for the target platform on that host platform so that the result is an executable for the target platform.

Then you copy that compiler binary onto the target machine and from thereon it is self-hosting.

Naturally at some point in early history someone really had to write something in assembler or machine code somewhere. Today, it is no longer a necessity but a "life choice".


As for the "how can I trust that the binary of the compiler I use doesn't automatically fill the binaries it compiles with spyware?" problem has been solved - you can use two independent compilers to compile the cross-compiler from the same source base and the target and both of those cross-compilers should produce bitwise-identical results for the target executable. Then you would know that the result is either free of spyware, or that the two independent compilers you used in the beginning would infect the resulting executable with exact same spyware - which is exceedingly unlikely.

  • But then, wouldn't you be using 'something' instead of 'nothing' to build your compiler? If the compiler binary you're using is not compiled by you from the start, then how can you trust it? What if the binary to cross-compile is actually built with a specific purpose, i.e. to cleverly embed some form of malware onto the binaries it produces? – Monke Jan 16 '21 at 16:18
1

You can write a really feeble C compiler in assembly or machine code, then bootstrap from there.

Before programming languages existed you just wrote machine code. That was simply how it was done.

Later came assembler, which is like "easy mode" machine code, and from there evolved high-level languages like Fortran and BCPL. These were decoupled from the machine architecture by having a proper compiler to do the translation.

Today you'd probably write something in C and go from there, anything compiled is suitable, though "compiled" is a loose definition now that LLVM exists and you can just bang out LLVM IR code instead of actual machine code. Rust started in OCaml and is now "self-hosted" on top of LLVM, for example.

tadman
  • 208,517
  • 23
  • 234
  • 262