I know this is a very basic question but when I compile my c/c++ code with gcc/g++ what exactly is the type of the intermediate output before assembler comes into play to generate the machine code ? Is it something like X86 instructions ?
-
2What _intermediate_ output do you mean? `g++` generates assembly directly. – Lol4t0 Feb 11 '12 at 20:58
-
The following options may be useful: `-g -Wa,-ahl=main.s`. They cause GCC/G++ to emit the assembly with interleaved high-level source code. – Joseph Mansfield Feb 11 '12 at 21:16
-
This is not a basic question at all. – cha0site Feb 11 '12 at 21:23
6 Answers
GCC's processing chain is as follows:
your source code
preprocessed source code (expand macros and includes, strip comments) (
-E
,.ii
)compile to assembly (
-S
,.s
)assemble to binary (
-c
,.o
)link to executable
At each stage I've listed the relevant compiler flags that make the process stop there, as well as the corresponding file suffix.
If you compile with -flto
, then object files will be embellished with GIMPLE bytecode, which is a type of low-level intermediate format, the purpose of which is to delay the actual final compilation to the linking stage, which allows for link-time optimizations.
The "compiling" stage proper is the actual heavy lifting part. The preprocessor is essentially a separate, independent tool (although its behaviour is mandated by the C and C++ standards), and the assembler and linker are acutally separate, free-standing tools that basically just implement, respectively, the hardware's binary instruction format and the operating system's loadable executable format.
-
+1, excellent answer. You might want to add a blurb about what assembly is, because it seems the asker isn't exactly clear on that. – cha0site Feb 11 '12 at 21:24
-
@cha0site: thanks... let's see; the OP is welcome to ask for clarification, in which case I'll be happy to expand. – Kerrek SB Feb 11 '12 at 21:25
-
@KerrekSB thanks a lot for the detailed answer I think I get it now. Answer by ZarakiKenpachi below was very helpful as well. So I guess gcc gets information about my hardware before generating the assembly or there is a separate compiler for each type of hardware ? – Cemre Mengü Feb 11 '12 at 21:30
-
1@Cemre: GCC is split internally into a language frontend (e.g. C, C++, Fortran) and a hardware backend (x86, PPC, ARM, etc.), but all this is compiled into one fixed compiler binary. You have to build the entire compiler suite for the desired target architecture, and so the resulting program binary for your source code is determined by the actual compiler that you choose. You need an ARM-compiler for ARM binaries, an x86 compiler for x86 binaries, etc. Compiling for a platform that isn't your own is called "cross-compiling". – Kerrek SB Feb 11 '12 at 21:35
-
1@Cemre: What GCC does (internally) is transform the code into a language agnostic and hardware agnostic internal representation called Abstract Syntax Tree by what is called a front-end, of which there is one for every language. This AST is then passed to a machine-specific back-end which generates assembly. However, like Kerrek said, you can't build just the front-end or just the back-end, you can only use the entire compiler (this is just how GCC was designed, it isn't an absolute restriction). – cha0site Feb 11 '12 at 21:55
-
@cha0site: To name an example, Clang is the C-family language frontend, which does the first preprocessing/compilation (to an intermediate representation called llvm IR), and then hands the generated IR over to llvm which generates assembly and hands that over to the system linker (mostly ld/gold on *nix systems, link.exe on Windows, though there is an llvm linker project going on). – Xeo Feb 12 '12 at 00:24
-
@KerrekSB, is it correct to say that steps 1-4 are those that each translation unit undergoes independently, whereas step 5 is the only one where more translations units (well, ther corresponding `.o` files) flow into a single executable? – Enlico Sep 18 '22 at 14:56
-
@Enlico: Yes, that's about right. There's also "link-time optimization" nowadays that blurs the lines a bit, but without that, yes, linking is when all the translation units are, well, linked. – Kerrek SB Sep 20 '22 at 01:22
So, compilation of executable in GCC consists of 4 parts:
1.) Preprocessing (gcc -E main.c > main.i; transforms *.c to *.i) Does include expansion, processes marcos. Removes comments.
2.) Compilation (gcc -S main.i; transforms *.i to *.s, if successful) Compiles C-code to Assembler (on target x86 architecture it is x86-assembly, on target x86_64 architecture it is x64-assembly, on target arm architecture it is arm assembly, etc.) Most of Warnings and Errors happens during this part (e.g. does Error and Warning reporting)
3.) Assembly (as main.s -o main.o; transforms *.i to *.o, again if successful) Assemblies generated assembler to machine code. Though there are still relative address of procedures, and such.
4.) Linking (gcc main.o) Replaces relative addresses with absolute addresses. Removes useless text. Linking errors and warnings during this phase. And in the end (if successful), we get executable file.
So, to answer your question, the intermediate output you mean is actually so called assembly language - see wiki about that Assembly language wiki.

- 447
- 3
- 14
Here's a graphic representation of the gcc compilation steps by courtesy of redhat magazine:
Contrary to what other answers imply, there's no assembly step - rather, generating assembler code replaces the object code generation; it doesn't make much sense to convert an in-memory representation to a textual one if what you really want is a binary representation.

- 164,997
- 36
- 182
- 240
-
Well, yes, it doesn't make much sense to generate mnemonics, which are intended for human to read, if you're going to make object code anyway. But mnemonics are almost 1:1 with object code, and the code generation part of the assembler _is_ done (calculating jump addresses, that sort of thing). – cha0site Feb 11 '12 at 22:00
It must be assembly code. You can get it using -S
flag in command line for compilation.

- 806
- 6
- 10
There is no "intermediate output". The first output you get is machine code. (Although you can get C/C++ intermediate output by invoking only the preprocessor with -E
.)

- 378,754
- 76
- 643
- 1,055