Assembly language and compiled languages

Question

How is assembly faster than compiled languages if both are translated to machine code?

I'm talking about truly compiled languages which are translated to machine code. Not C# or Java which are compiled to an intermediate language first and then compiled to native code by a software interpreter, etc.

On Wikipedia, I found something which I'm not sure if it's in any way related to this. Is it because that translation from a higher level language generates extra machine code? Or is my understanding wrong?

A utility program called an assembler is used to translate assembly language statements into the target computer's machine code. The assembler performs a more or less isomorphic translation (a one-to-one mapping) from mnemonic statements into machine instructions and data. This is in contrast with high-level languages, in which a single statement generally results in many machine instructions.

Note that most high level language compilers compile first to assembly code, which is then compiled by an separate assembler. So an (optimal) assembler program can never be slower than a compiled source. If your (non optimal) assembly is slower, you could just exchange your code with the compiler generated code and do something more useful than trying to write efficient assembler. — Gunther Piez, Dec 08 '09 at 11:45

Joey · Accepted Answer · 2010-02-14T09:17:39.527

28

Well, it relates a bit to your question, indeed. The point is that compilers produce inefficient machine code at times for various reasons, such as not being able to completely analyze your code, inserting automatic range checks, automatic checks for objects being null, etc.

On the other hand if you write assembler code by hand and know what you're doing, then you can probably write some things much more efficient than the compiler, although the compiler's behavior may be tweaked and you can usually tell it not to do range checking, for example.

Most people, however, will not write better assembler code than a compiler, simply because compilers are written by people who know a good deal of really weird but really cool optimizations. Also things like loop unrolling are usually a pain to write yourself and make the resulting code faster in many cases.

While it's generally true that everything that a computer executes is machine code, the code that runs differs greatly depending on how many abstraction levels you put between the machine and the programmer. For Assembler that's one level, for Java there are a few more ...

Also many people mistakenly believe that certain optimizations at a higher abstraction layer pay off at a lower one. This is not necessarily the case and the compiler may just have trouble understanding what you are trying to do and fail to properly optimize it.

edited Feb 14 '10 at 09:17

answered Dec 08 '09 at 11:21

Joey

344,408
85
689
683

That means that assembly code may not be usually faster than compiled code (C/C++) ? – akif Dec 08 '09 at 11:28
2

Hand-written assembly code can be faster than compiled code if you *know what you're doing*. Most of the time, it won't. – R. Martinho Fernandes Dec 08 '09 at 11:31
Depends on the person who's writing it. See yu_sha's answer which nicely summarizes that. Compilers actually have fixed sets of rules how to write certain things more efficiently; those were ultimately created by people. People who may eb able to adapt those things to other situations as well and produce more efficient code than a compiler can. But in many many cases the compiler is much better at such things. For those snippets where it fails you can still resort to inline assembler but don't be surprised if it runs slower than the orignal code if you switch to a better/newer compiler. – Joey Dec 08 '09 at 11:31
4

I usually compare it to the difference between a car with a manual transmission or an automatic transmission. People say that a manual transmission gets better fuel economy, and that is true *if the driver is very skilled*, because it gives him better control, and that control allows a skilled driver to shift in the best way. If the driver does not know exactly what he is doing, then that finer control means he will actually do a worse job than the automatic would have. – Crashworks Dec 08 '09 at 11:32
@Crashworks: Great analogy. I do feel like writing bad assembly when I'm driving :( – R. Martinho Fernandes Dec 08 '09 at 11:36
@Crashworks: But driving with an automatic transmission is not nearly as exciting... – PhiS Dec 08 '09 at 19:53
It's fairly easy btw to take a disassembly of compiled code, handoptimize the assembler a bit while benchmarking and outdo the compiler. It is harder to do this for all 40000 procedures. – Marco van de Voort Dec 08 '09 at 22:39
I've been playing with code since I was 11 (28 years ago) and I remember assembly code always produced a much smaller file which ran exponentially faster than an equivalent written in, turbo c++. I think that was because the headers bloated the resulting machine code. I'm not sure if that's still the case though. – iuppiter Apr 10 '16 at 10:34
1

@iuppiter: Modern compilers are *much* better than 28 years ago, and we're not compiling for segmented x86 anymore. Flat memory models are easier to optimize for. You can still often beat compilers on a local scale (for a hot loop), though; missed optimizations are still unfortunately very common, some of them important and some of them not. See [Why is this C++ code faster than my hand-written assembly for testing the Collatz conjecture?](https://stackoverflow.com/questions/40354978/why-is-this-c-code-faster-than-my-hand-written-assembly-for-testing-the-collat/40355466#40355466) :) – Peter Cordes Dec 02 '17 at 12:51

score 9 · Answer 2 · edited Dec 09 '19 at 18:02

9

Assembly may sometimes be faster than a compiled language if an assembly programmer writes better assembly than that generated by the compiler.

A compiled language is often faster than assembly because programmers who write compilers usually know the CPU architecture better than programmers who are utilizing assembly in a one-off, limited-case, situation.

edited Dec 09 '19 at 18:02

the_endian

2,259
1
24
49

answered Dec 08 '09 at 11:23

yu_sha

4,290
22
19

It's they other way around: Handwritten assembly is often slower than a compiled source, because the wannabe assembler programmer just don't has a clue. – Gunther Piez Dec 08 '09 at 11:41
No, some programmers know the CPU architecture as well as compiler writers do. However, the optimization is so difficult a problem that the computer (running the compiler) will often do better than clever programmers. – Basile Starynkevitch Dec 02 '17 at 12:42
Beating the compiler with hand-written asm is often possible for a single hot loop. But at a large scale, constant propagation and various inlining possibilities make it unmaintainable to use asm for more than that. Ideally you can hand-hold a compiler into making nice asm by tweaking the source, giving you the best of both worlds. (Good asm now, *and* in the future for different CPUs, or different use-cases or surrounding code). See [C++ code for testing the Collatz conjecture faster than hand-written assembly - why?](//stackoverflow.com/a/40356449) for discussion of that. – Peter Cordes Dec 10 '19 at 02:26

score 4 · Answer 3 · edited Dec 02 '17 at 12:16

4

An assembly expert may be able to write assembly code that is more effective (fewer instructions, more efficient instructions, SIMD, ...) than what a compiler generates automatically.

However, most of the time, you're better off trusting the optimizer of your compiler.

Learn what your compiler does. Then let the compiler do it.

edited Dec 02 '17 at 12:16

Peter Mortensen

30,738
21
105
131

answered Dec 08 '09 at 11:20

Gregory Pakosz

69,011
20
139
164

Moreover, an assembly export will be able to fully use the processor's registers and instruction extensions, to perform most of the computation without accessing external memory, thus providing a noticeable boost. Of course, it depends if you target one or more processor... – Laurent Etiemble Dec 08 '09 at 11:24
@Laurent: and it also depends on your compiler. There's no reason a specialized compiler couldn't do that. – R. Martinho Fernandes Dec 08 '09 at 11:26
1

I thought about linking those slides. That's actually a nice collection of clever tricks compilers know about but many people who try writing assembly for performance reasons don't. – Joey Dec 08 '09 at 11:28
The trouble (and reason for writing assembly) is that there are lots of things the compiler could do but doesn't. – Crashworks Dec 08 '09 at 11:34
The trouble with a lot of people writing assembly is that they often do it even before profiling their application :) – Gregory Pakosz Dec 08 '09 at 11:55
@LaurentEtiemble. `gcc -march=native`, `clang -march=native` or `icc -xHOST` already does that. They do know how/when to use most of the x86 instruction-set extensions. gcc makes pretty good use of BMI1/BMI2 for example, and of course can use SSE-whatever and AVX1/2/512 when autovectorizing. Some compilers can even notice some kinds of popcount C implementations and compile them into a `popcnt` instruction, without having to use an intrinsic. – Peter Cordes Dec 22 '16 at 07:42
That article on compiler tricks is nice. It misses the best sequence for testing if sign bits are equal, though (for CPUs with fast `setcc`, i.e. anything from the last 10 years). xor-zero eax / `xor %edi, %esi` / `setns %al`. Unfortunately no modern compilers generate it either. That has 2 cycle latency from either input to the result (any recent Intel or AMD CPU), and is only 3 total instructions. The xor-zero is off the critical path. (Surprised the comment on SunCC's `test %esi,%esi` calls it "smarter" when `xor` already set flags according to the result so it's totally redundant.) – Peter Cordes Dec 22 '16 at 08:05

score 2 · Answer 4 · answered Dec 08 '09 at 11:51

My standard answer when questions about assembly vs. high-level come up is to take a look at Michael Abrash's Graphics Programming Black Book.

The first couple of chapters give a good idea of what you can optimise effectively using assembly, and what you can't.

You can download it from GameDev - Jeff's links seem to be broken now unfortunately.

score 2 · Answer 5 · answered Dec 08 '09 at 13:38

All good answers. My only additional point is that programmers tend to write a certain number of lines of code per day, regardless of language. Since the advantage of a high-level language is that it lets you get more done with less code, it takes incredible programmer discipline to actually write less code.

This is especially an issue for performance because it matters almost nowhere except in a tiny part of the code. It only matters in your hotspots - code that you write (1) consuming a significant fraction of execution time (2) without calling functions (3).

score 1 · Answer 6 · edited Dec 02 '17 at 12:17

1

First - assembler should be used only in small code pieces, which eat most of the CPU time in a program - some kind of calculations for example - in the "bottle neck" of algorithm.

Secondly - it depends on experience in ASM of those who implements the same code in Assembler. If the assembler implementation of "bottle neck" code will be faster. If experience is low - it will be slower. And it will contain a lot of bugs. If experience is high enough - ASM will give significant profit.

edited Dec 02 '17 at 12:17

Peter Mortensen

30,738
21
105
131

answered Dec 08 '09 at 11:21

user224564

1,313
1
10
14

2

And pointing out that "If experience is high enough" is rarer than most people think: compilers are smarter than most people who think themselves smarter than the compiler. – R. Martinho Fernandes Dec 08 '09 at 11:25

score 1 · Answer 7 · answered Dec 08 '09 at 11:31

First of all, compilers generate very good (fast) assembly code.

It's true that compilers can add extra code since high order languages have mechanisms, like virtual methods and exceptions in C++. Thus the compiler will have to produce more code. There are cases where raw assembly could speed up the code but that's rare nowdays.

Basile Starynkevitch · Answer 8 · 2017-12-02T12:47:41.300

How is assembly faster than compiled languages if both are translated to machine code?

The implicit assumption is hand-written assembly code. Of course, most compilers (e.g. GCC for C, C++, Fortran, Go, D etc...) are generating some assembler code; for example you might compile your foo.cc C++ source code with g++ -fverbose-asm -Wall -S -O2 -march=native foo.cc and look into the generated foo.s assembler code.

However, efficient assembler code is so difficult to write that, today, compilers can optimize better than human do. See this.

So practically speaking, it is not worth coding in assembler (also, take into account that development efforts cost very often much more than the hardware running the compiled code). Even when performance matters a lot and is worth spending a lot of money, it is better to hand-code only very few routines in assembler, or even to embed some assembler code in some of your C routines.

Look into the CppCon 2017 talk: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

Assembly language and compiled languages

8 Answers8

Linked