What is the reason that languages like C, C++ and similar compile their code down to assembler code, instead of just producing the binary directly? Is it just too hard to infer the "correct" programming from the abstracted language? It seems to me that converting to something that will again be converted is not an optimal way of doing things, but there are probably good reasons for this that I am unaware of. Is this connected to every CPU architecture having different implementations?
Asked
Active
Viewed 308 times
0
-
4There's plenty of compilers that compile directly to machine code, probably most of them these days. – Ross Ridge Jan 13 '20 at 23:29
-
7*What is the reason that languages like C, C++ and similar compile their code down to assemble* they don't. – President James K. Polk Jan 13 '20 at 23:29
-
Since assemblers and linkers already exist so if you want to write a new compiler that's less things to worry about. E.g. `gcc` doesn't care if you want ELF or PE format binary - just use the proper assembler and linker. – Jester Jan 13 '20 at 23:31
-
architecture is already taken into consideration in compiling down to assembly – Christian Gibbons Jan 13 '20 at 23:32
-
@Jester I guess that makes sense. – C. K. Jan 13 '20 at 23:39
-
3What compiler are you talking about? Why do you think it generates assembly rathen than binary? – HolyBlackCat Jan 13 '20 at 23:39
-
@HolyBlackCat Because this is what we were taught in my university course for low level programming. Given, that was specifically for the EFM32GG microcontroller. – C. K. Jan 13 '20 at 23:41
-
1Much easier to debug the compiler if it outputs humen readable code instead of object files – 0___________ Jan 13 '20 at 23:42
-
2gcc generates assembly. clang can work with either integrated or external assembler. – Jester Jan 13 '20 at 23:44
-
1I haven't used a compiler that produced assembly since the 1980s. – user207421 Jan 13 '20 at 23:49
-
@RossRidge Given the amount of disagreement in the comments and that your linked post to "this question has an answer here" actually doesn't have an accepted answer, makes me think you should open the question and let it be answered properly. – C. K. Jan 13 '20 at 23:51
-
1@user207421 then you haven't run `gcc`. Just because you don't see the assembly and you don't need to run the assembler by hand, it's still there. You can use `gcc -v` or `strace` or even catch the assembler running in a process monitor if you are lucky and compile a lot of files. – Jester Jan 13 '20 at 23:52
-
@Jester: I'm curious as to why they would produce assembly? Why would a textual form of processor instructions which are then used to produce an object file be better than simply going right to the object file? Why are two passes better than one? – President James K. Polk Jan 14 '20 at 00:03
-
2@C.K. That's not how it works. If you or anyone else thinks they have a better more "proper" answer, they can post it to the linked question. We don't need redundant (or just plain wrong) answers posted here. In any case, the linked question does have a proper answer to your question by Peter Cordes. – Ross Ridge Jan 14 '20 at 00:10
-
compiling to asm is the sane way to do it. the intermediate files are often hidden/destroyed as with gcc. but you can ask them not to be so you can inspect them. gcc the program itself is just a shell around a few programs that preprocesses, compiles, assembles and links using files in between. very much the unix way. straight to machine code has some corner cases where it makes sense, but in general it produces a less reliable product as it is more difficult to write the code, inspect and debug the output. In general it adds no value to go straight to machine code. – old_timer Jan 14 '20 at 01:56
-
ideally you compile to some intermediate thing, bytecode, icode, tables and structures, etc front end, middle, backend. with clang the middle can be saved/expressed in separate files from the compiled langauge, and then you can use an external assembler or internal as mentioned, but can also use that language directly. gcc it appears to be internal, but the suite of tools allows you to have multiple languages feed into the same middle taking advantage of the remaining portions of the toolchain without duplication. including using the assembler to avoid that duplication of effort/debug/risk – old_timer Jan 14 '20 at 01:59
-
different way to say it. It is part of a modular approach that reduces work, risk, and duplication of effort, allows for larger flexibility and reusability. It implies that the toolchain was designed rather than thrown together. (granted examine the internals of gcc or clang for that matter and the implementation is held together with duct tape and bailing wire and barely works, at least at a high level they were designed tools) – old_timer Jan 14 '20 at 02:02
1 Answers
-5
Assembler code and the binary are logically equivalent, the assembler code is just represented using so-called mnemonics which are a more human-readable form of the machine instructions.
All compilers do directly produce a binary.

pathetic-lynx
- 36
- 4
-
*"All compilers do directly produce a binary."* My understanding is that this is not (necessarily) true. In my low level programming course at university we were taught that our code was compiled from C down to Assembler before everything is linked and produced an executable binary file. – C. K. Jan 13 '20 at 23:39
-
@C.K. Compilers *do* produce binary files. Linking those correctly then produces binary executable files. A compiler can even produce a binary executable directly if linking is not required. Compiles do not normally produce assembly, but they usually *can* (e.g. `gcc -S`). – Marco Bonelli Jan 13 '20 at 23:42
-
1Your university course is wrong. Historically, yes, the first C compilers generated assembly, which was then assembled. However, now days, they produce machine code, which is binary. – ChrisMM Jan 13 '20 at 23:42
-
Are you sure you mean Assembler and not object files? Object files still need to be linked with other object files (if you call a function that is in a different source file) but they are nevertheless in machine code. – pathetic-lynx Jan 13 '20 at 23:44
-
3@ChrisMM gcc, clang, and many others generate assembly code first. The assembler is called automatically and you do not even see it. Just tell the compiler to keep temporary files. – 0___________ Jan 13 '20 at 23:44
-
`All compilers do directly produce a binary.` nowadays very few actually. Definitelly none of the most common used ones. – 0___________ Jan 13 '20 at 23:45
-
@P__J__ true that. Nowday's compilers compile, assemble and link all with a single execution. – Marco Bonelli Jan 13 '20 at 23:46
-
3@MarcoBonelli you might use a single **invocation** but for example gcc (which is a compiler driver nowadays) will call the assembler and linker for you. You can use `gcc -v` to see what it's doing. – Jester Jan 13 '20 at 23:47
-
@P__J__, is it assembly, or an intermediate platform independent form? – ChrisMM Jan 13 '20 at 23:48
-
-
@MarcoBonelli many copmpilers in the past were doing it. But most of modern ones just compile to sassembler. – 0___________ Jan 13 '20 at 23:49
-
-
@ChrisMM `clang` can also emit LLVM IR instead of assembly, which is a platform-independent intermediate representation. – Marco Bonelli Jan 13 '20 at 23:50
-
@MarcoBonelli, that's what I assume compilers would be doing, instead of assembly itself. That's even how I made my Oberon compiler a _few_ years back, an intermediate representation only. – ChrisMM Jan 13 '20 at 23:53
-
@P__J__: clang compiles straight from C to an object file, only producing asm output if you ask for it. `gcc` literally does produce a `.s` temporary file (or pipe) which it runs a separate `as` program on, as part of producing a `.o`. **See my answer on [Does a compiler always produce an assembly code?](//stackoverflow.com/a/53818152)** (this question is closed as a duplicate of that) – Peter Cordes Jan 14 '20 at 01:38
-
@MarcoBonelli: clang only emits LLVM-IR or asm if you ask for it. Normally it keeps the representation of the machine instructions internal to the compiler, and only writes out a `.o` object file. (Which then is linked into an executable.) – Peter Cordes Jan 14 '20 at 01:41
-
@PeterCordes yes, which is exactly why I initially wrote t'ha compilers normally output a binary – Marco Bonelli Jan 14 '20 at 10:11
-
@MarcoBonelli: you initially wrote that compilers don't normally produce asm, but they can if you use `gcc -S`. GCC was a poor example: it always compiles to a temporary `.s` even if you don't ask it to keep that `.s` around. Most other compilers do go straight to object files. Anyway, I forget which of your other comments I might have been replying to; not a big deal either way. – Peter Cordes Jan 14 '20 at 13:16
-
-
1@MarcoBonelli: I think you're still missing the point. GCC *always* produces a `.s` (or a pipe). It doesn't have a built-in assembler as part of the compiler binary `cc1`. The `-S` options just decides whether to keep that file around instead of assembling it + deleting it. Of course it *eventually* produces an object file or a linked binary, but that's not what this question is about. – Peter Cordes Jan 14 '20 at 13:21