Doing something in assembly vs having the assembler do it

Question

Which of the following two methods is preferred to get 2^n - 1? Why is one preferred over the other?

# (2a) -- instructions
mov $1, %eax
shl $X, %eax
dec %eax

# (2b) -- assembler
mov $((1 << X) - 1), %eax

I find the first more readable personally but I'm pretty sure readability isn't the point of asm.

Let the assembler do it if possible. This saves time at runtime. — fuz, Sep 25 '20 at 21:59
@fuz ok -- is the assembler able to do anything outside of arithmetic, or that's mainly it? — carl.hiass, Sep 25 '20 at 22:01
Beyond arithmetic with constants and named constants, we can take the difference between two labels, often handy, and, we can offset from a label. Not all assemblers can do arithmetic to compute compile time constants as it takes real work in the assembler (expression evaluation, parsing and interpreting), and requires syntax differentiation from regular instructions, (still, this is quite useful in some scenarios so a serious assembler ought to..). — Erik Eidt, Sep 25 '20 at 22:07
@carl.hiass You can write macros for more compex things. What do you have in mind? — fuz, Sep 25 '20 at 22:08
There is no right answer, one it is tool specific so not always work. But end of the day it is personal preference, are you using assembly language because you want to be in control or because you want a high level language. Is this readability or control over the instructions, etc...primarily opinion based. — old_timer, Sep 26 '20 at 01:02

score 2 · Accepted Answer · answered Sep 26 '20 at 00:40

Always do as much as possible at assemble-time (once per build), not at runtime where it costs code size, and costs time every time this block executes.

They both start with the same mov $imm32, %eax form of mov, but then the first version wastes 2 extra instructions so it's total garbage with zero advantages, and looks super ugly and insane to anyone used to thinking about efficiency.

There is no compiler to optimize your code into something the CPU can run efficiently, it's up to you to make that happen. If you don't care about performance, there's basically no reason to be messing around with asm in the first place¹, either writing it by hand or thinking about compiler output.

The fact that you even have to ask this is a sign you're either missing the point of assembly language (usually performance), or you're mistakenly thinking of asm the same way as you would a compiled language like C++. You need to adjust your mental model to think about the machine code you're creating, that the CPU will execute, and how to make that as efficient as possible.

You need to think like a compiler; "how can I do this in as few uops as possible for the front-end?" (https://agner.org/optimize), with minimum code-size in bytes as a tie-breaker. Or depending on your goals, maybe optimizing for code-size over speed. But anyway, compilers aggressively evaluate expressions and do constant-propagation as much as possible to combine constants in the source code into compile-time work instead of run-time.

Footnote 1: In that case, write in a language that has a nice optimizing compiler, e.g. C or Rust, and let it create machine code for you. (Although to be fair, a few things are easier in asm than C if you know both equally well, such as extended precision math. Very few high-level languages make it easy to use the carry output from other operations.)

Readability:

You are 100% correct that readability is usually not the top priority in asm; it always takes a back seat to code-size and/or performance in any case where it's worth writing asm by hand in the first place. But within those constraints, we can certainly aim for as much readability as possible.

Your runtime computation way is extremely surprising to experienced asm users reading your code, and not idiomatic at all. If I came across that in otherwise-sane code, it would take me some time to double-check and make sure I was understanding it properly (e.g. maybe there's some non-constant input to this after all, or maybe this sets FLAGS a certain way that's also needed later).

The only reason to do work at run-time is when it couldn't have been done at compile time (because it's not constant) so it would be very surprising to see a shift whose input came from. If I saw that sequence of 3 instructions to create a 32-bit constant in production code (not beginner questions on Stack Overflow), I'd be shocked at the incompetence of whoever wrote it, after figuring out it was just creating a 32-bit constant.

Apart from that, the runtime version is 2 more instructions to read, if this appears as part of a larger block of code. Code density (in terms of amount done per source line) is already low in asm, so minimizing instruction count is generally good for overall readability of a function.

(As well as usually being good for efficiency, except for cases like replacing a slow instruction like div with a multiplicative inverse + shift. But that's bad enough for readability that it's not too weird for hand-written asm to mov an immediate to a register and then div by it, if performance wasn't the top priority of that one function or block of code, e.g. because it doesn't run often. Unless the divisor is a power of 2, then it's just a really stupid less convenient alternative to a right shift.)

(1<<n) - 1 is a pretty common idiom that most experienced asm programmers are familiar with. See also https://catonmat.net/low-level-bit-hacks (Many people will also be familiar with binary tricks like this from low-level experience in other languages, it's definitely not unique to asm.)

So for this case specifically, I'd really say just get used to seeing stuff like and $(1<<X) - 1, %eax. Or and $-16, %eax as a convenient way to write an AND mask that zeros the low 4 bits, rounding EAX down to a multiple of 16. (Taking advantage of 2's complement).

Macros

However, you can avoid repeating that expression everywhere you use it by defining an assemble-time constant like XMASK = (1<<X) - 1 that you can use instead.

Or you can do something like

#define SHIFT2MASK(x_)  ((1<<x_)-1)

...
X=3

mov   $SHIFT2MASK(X), %eax

and   $SHIFT2MASK(4), %ecx

and compile with gcc -c foo.S to run your asm source through the C preprocessor.

(GAS native macros work like instructions, not for single operands to other instructions, so a macro language like the C preprocessor is more convenient for this.)

The hard part with this approach is choosing a clear macro name that unambiguously conveys the fact that it turns a shift count into a mask with set bits up to that position. Not 0xfffffff0 or something, and not just 1<<4 either. For testing a bitmap, you would be doing stuff like test $1<<3, %al, and a mask could just as easily describe the value with 1 bit set at the appropriate position.

To be clear, SHIFT2MASK is not fully unambiguously named. Other than from context of how it's getting used, hopefully. Ideally it can be self-explanatory enough that the comments can be higher-level, describing the algorithm, not the nuts and bolts that the reader can already see in the code itself.

Doing something in assembly vs having the assembler do it

1 Answers1

Macros

Linked