1

If I am not wrong, all compilers have an assembler(ml, as, etc..) and they use it to translate high-level code into machine language in the background(c/c++ code -> asm code -> machine code). But I wonder the modern compilers work like that too or compile the high-level source code into direct machine code? So in short, does MSVC using ml.exe or GCC using ./as in the background?

LambdaCore
  • 13
  • 4
  • gcc invokes `as`. You can use `gcc -v` to see what it is doing. – Jester Feb 11 '23 at 11:37
  • LLVM compiles directly to machine code, but it can be configured to use an assembly intermediate step instead. Turbo C also compiled directly to machine code, as does tcc. – fuz Feb 11 '23 at 11:40
  • The job and purpose of a build system is to produce [executables](https://en.wikipedia.org/wiki/Executable). A build system is composed of compilers and linkers. A compiler's job it to produce machine code, usually [object code](https://en.wikipedia.org/wiki/Object_code), due to the nature of separate compilation; object code is the input to the linker. Whether assembly language is used as an intermediate is up to the specific toolchain. Some do some don't; it isn't and has never been strictly necessary. – Erik Eidt Feb 11 '23 at 16:04
  • But most build systems will include an assembler even if they don't use it internally in the compiler, so that they can handle assembly code, source or generated. – Erik Eidt Feb 11 '23 at 16:06
  • a compiler, assembler and linker comprise a toolchain, but the compiler obviously does not have to produce assembly. It seems like the more sane/modular solution as you have the assembler and you can much more easily "see" the compiler output rather than some bucket of bits (that would take yet another tool to help debug, a tool that for most isas is marginally accurate). but it is not required and not every compiler is designed to have an assembly language step as a requirement or option. – old_timer Feb 12 '23 at 02:48
  • many if not all compilers boil the high level language into some sort of internal "code" or set of structures that then is an intermediate step to the backend of generating target code. llvm for example that middle information itself can be output as a file, has a programming language, etc. so with clang/llvm you can have high level, bytecode, byte code in an ascii form, assembly language, object, and linker or....you can go from high level to object code, and then link. you can optimize at each level. – old_timer Feb 12 '23 at 02:51
  • (although you only get one linker if any even though the rest of the toolchain is capable of producing code for different targets) – old_timer Feb 12 '23 at 02:51

2 Answers2

3

It varies.

  • gcc does use the external as program. Not "in the background", but as a separate pass operating on a temporary .s file written by the compiler. Or, if you use the -pipe option, in a pipeline. You can see the as command that is run if you compile with gcc -v.

  • clang has an "integrated assembler" which is used by default instead of as. However, if you switch it off with -fno-integrated-as, then it will run as separately and you can see this in clang -v output.

  • I believe that MSVC does not use a separate assembler, but I am not certain of this.

Note that if a compiler is going to support inline asm (as gcc and clang both do), then it can't very easily skip an assembler pass completely. Some stage of the process still has to know how to assemble every instruction mnemonic into machine code. In some cases, inline asm might expect to be able to interact with asm defined elsewhere in the file, and this is hard to support unless you have a pass where you truly generate the entire module into assembly, or at least into some pre-parsed asm-equivalent internal representation.

MSVC does not support inline assembly on x64, so it would not have this issue. Indeed, this might have been part of the reason not to support it.

So it really just comes down to a design decision. There are some benefits to compiling directly to machine code:

  • better compilation performance,

  • it might make certain micro-optimizations easier

and some benefits to an external assembler:

  • avoids reinventing the wheel, if the system already has a working assembler

  • separation of concerns: the compiler doesn't have to know anything about machine code or object file format, the assembler doesn't have to know anything about the compiler's IR

  • easier to ensure 100% compatibility with code written for the existing assembler. For instance, clang occasionally has issues building source written for gcc/gas if it contains inline asm using obscure gas features, since the clang integrated assembler doesn't always support them compatibly.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • MSVC doesn't use a separate assembler, and from what I've read, the asm listings it can emit aren't ready to assemble into a fully working object files you could actually link. (e.g. they contain extraneous definitions that will lead to name conflicts.) That's why its asm output option is called a "listing". https://learn.microsoft.com/en-us/cpp/build/reference/fa-fa-listing-file?view=msvc-170&viewFallbackFrom=vs-2019 – Peter Cordes Feb 11 '23 at 20:16
  • MSVC's inline `__asm {}` syntax has to *compile* that block, not just pass it on to the assembler. Or at least it has to parse it for C++ variable names and substitute those with addressing modes, and know which registers are written by every instruction (so it knows what to save/restore in a function that contains the asm block). MSVC inline asm blocks don't support full MASM stuff for that reason, e.g. no `db` to emit arbitrary machine code, and IDK if you can define labels to do crazy stuff like jump between asm statements (something that's possible but unsupported for GCC). – Peter Cordes Feb 11 '23 at 20:21
  • IIRC, clang uses its built-in assembler by default, and handles each GNU C `asm()` statement separately, not as part of one large file. IIRC you can't define a `.macro` in one and use it in another, unless you use `clang -fno-integrated-as` – Peter Cordes Feb 11 '23 at 20:22
  • My understanding of MSVC's decision not to support it on anything except x86 is that the internal implementation is really [brittle and hard to maintain, and isn't safe in functions with register args (even on x86)](https://stackoverflow.com/questions/3323445/what-is-the-difference-between-asm-asm-and-asm#comment59576185_35959859). And it was never a very good design, forcing a store/reload to get data into an asm statement. Providing intrinsics for everything is better than inefficient inline asm. Also probably a lot of people expected to be able to do crazy things behind the compiler's back – Peter Cordes Feb 11 '23 at 20:53
  • (BTW, I closed this question as a duplicate based on the question body, but then noticed that you're answering a more general interpretation of the question title, that they need an assembler to support inline asm even if they normally compile straight to machine code in object files.) – Peter Cordes Feb 11 '23 at 20:54
  • Related: https://rust-lang.github.io/rfcs/2873-inline-asm.html#implement-an-embedded-dsl discusses the fact that MSVC `__asm { .. }` syntax is basically a domain-specific language embedded inside C++, that a compiler has to understand to compile such code. (Unlike GNU C inline asm which is just `%operand` string substitution into a template and them emitting that text literally into the compiler's asm output to be assembled.) And that it's desirable not to have to write a new DSL for every architecture Rust supports. – Peter Cordes Feb 11 '23 at 20:58
  • Another benefit to compiling directly to machine code: if you use an assembler, "You never know where its going to put things, so you'd have to use separate constants." https://web.archive.org/web/20180212083430/https://cboh.org/mel.txt – prl Feb 12 '23 at 05:37
1

Taken literally; a modern compiler doesn't need an assembler - it's easier and more efficient to convert the "final instruction representation" directly to machine code than it is to convert it to text.

The problem is that you're not looking at modern compilers. Both of the compilers you mentioned are about 30 years old now (GCC first released in 1987 and MSVC first released in 1993, according to their Wikipedia pages) and nobody likes new versions of old things that break compatibility.

The most well known modern compiler is probably Clang; but it's designed as a drop-in replacement for an old compiler (Clang tries to support the same command line args, inline assembly syntax, extensions, ... as GCC).

Essentially, someone writing a modern compiler has 3 choices:

a) break compatibility with ancient things and only generate machine code.

b) write/maintain more code to be able to generate machine code (as default for efficiency) and also be able to generate assembly/text (when requested via. command line arguments)

c) avoid breaking compatibility and avoid writing/maintaining more code; and only generate assembly/text (despite the potential efficiency loss). Note that this can include seamlessly starting an external assembler (via. system() maybe) so that the user doesn't need to deal with it themselves.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • 1
    "A compiler doesn't need an assembler, **but a compiler author might**". Choice 'a' seems arrogant and unrealistic. Most compiler **can** output IR forms. Is godbolt used by anyone? A complier might have an option to go direct to binary; but there are many use cases where the **option** to generate assembler is valuable. I would call a compiler that can not generate assembler deficient. – artless noise Feb 11 '23 at 20:10