0

I have started learning C++, and I have learned that a compiler turns source code from a program into machine code through compilation.

However, I've learned that C++ compilers actually translate the source code into Assembly as an interim step before translating the Assembly code into machine code. What is the purpose of this step?

Telescope
  • 2,068
  • 1
  • 5
  • 22
  • 2
    assembly code is machine code. Edit : My bad, it's not. – user Jul 02 '20 at 16:51
  • 5
    @user, no, it is not. Assembly usually has a one-to-one relation with machine code, but it is NOT machine code. – ChrisMM Jul 02 '20 at 16:52
  • Maybe so that you can see how the compiler generated code for a function. – Thomas Matthews Jul 02 '20 at 16:52
  • 2
    @ChrisMM They do not have a 1-to-1 relation. – Thomas Jager Jul 02 '20 at 16:53
  • 1
    @ThomasJager, I was editing to say "usually" :P – ChrisMM Jul 02 '20 at 16:53
  • 17
    Compilers may skip the generation of assembly code and emit machine code directly. There is no requirement for compilers to generate assembly code. – Thomas Matthews Jul 02 '20 at 16:53
  • 1
    I have had the compiler generate interwoven assembly code so I can easily debug the code; other times I'm trying to figure out how to make the compiler use some specialized processor instructions. – Thomas Matthews Jul 02 '20 at 16:55
  • 2
    **Where** did you hear that? It’s (in general) totally false. – Konrad Rudolph Jul 02 '20 at 17:04
  • 1
    Good info at [Why do compilers produce assembly code?](https://cs.stackexchange.com/q/14749). – Fred Larson Jul 02 '20 at 17:05
  • @KonradRudolph: For one of the common compilers it is true: gcc! Definitely "in general" is false :-) – Klaus Jul 02 '20 at 17:08
  • For the user votes to close: The question is not asking "why" vendors are using intermediate assembly, which is quite clear opinion based. But OP asks for the purpose, which can be answered and has nothing to do with opinions! – Klaus Jul 02 '20 at 17:31
  • Compilers usually translate the source language to some intermediate representation that they then optimize. Then they generate machine code or ASM, from the IR, depending on what you ask for. – Jesper Juhl Jul 02 '20 at 18:08

2 Answers2

8

Why don`t they translate it directly into the machine code?

First of all: There is no need to write an intermediate assembly language representation. Every compiler vendor is free to emit machine code directly.

But there are a lot of good reasons to "write" an intermediate assembly and pass it to an assembler to generate the final executable file. Important is, that there is no need to really write a file to some kind of media, but the output can directly piped to the assembler itself.

Some of the reasons why vendors are using intermediate assembly language:

  • The assembler is already available and "knows" how to generate some executable file formats ( elf for example ).

  • Some tasks can be postponed until assembly level is reached. Resolving jump targets for example. This is possible because the intermediate assembly is often not only 1:1 representation but some kind of "macro-assembler" which can do a lot more than simply creating bits from mnomics.

  • the assembler level is followed by executing the linker. This must also be done if a compiler directly wants to create executable file formats. A lot of duplicated jobs if this must be coded again. As an example all the relocation of before "unknown addresses" must be done on the way to an executable file. Simply use the assembler/linker and the job is done.

  • The intermediate assembly is always useful for debugging purpose. So there is a more or less hard requirement to be able to do this intermediate step, even if it can be omitted if no debug output is requested from the user.

I believe there are are lot more...

The bad side is:

  • "writing" a text representation and parsing the program from the text takes longer as directly passing the information to the linker.
Klaus
  • 24,205
  • 7
  • 58
  • 113
4

Usually, compilers invoke the assembler (and the linker, or the archiver) on your behalf unless you ask it to do otherwise, because it is convenient.

But separating the distinct steps is useful because it allows you to swap the assembler (and linker and archiver) for another if you so desire or need to. And conversely, this assembler may potentially be used with other compilers.

The separation is also useful because assemblers already existed before the compiler did. By using a pre-existing assembler, there is no need to re-implement the machine code translation. This is still potentially relevant because occasionally there will be a need to boot-strap a new CPU architecture.

eerorika
  • 232,697
  • 12
  • 197
  • 326