In assembly, how to add integers without destroying either operand?

Question

Using AT&T syntax on x86-64, I wish to assemble c = a + b; as

add %[a], %[b], %[c]

Unfortunately, GNU's assembler will not do it. Why not?

DETAILS

According to Intel's Software Developer's Manual, rev. 75 (June 2021), vol. 2, section 2.5,

VEX-encoded general-purpose-register instructions have ... instruction syntax support for three encodable operands.

The VEX prefix is an AVX feature, so x86-64 CPUs from Sandy Bridge/Bulldozer onward implement it. That's ten years ago, so GNU's assembler ought to assemble my three-operand instruction, oughtn't it?

To clarify, I am aware that one can write it in the old style as

mov %[a], %[c]
add %[b], %[c]

However, I wish to write it in the new, VEX style. Incidentally, I have informed the assembler that I have a modern CPU by issuing GCC the -march=skylake command-line option.

What is my mistake, please?

SAMPLE CODE

In a C++ wrapper,

#include <cstddef>
#include <iostream>

int main()
{
    volatile int a{8};
    volatile int b{5};
    volatile int c{0};
    //c = a + b;
    asm volatile (
        //"mov %[a], %[c]\n\t"
        //"add %[b], %[c]\n\t"
        "add %[a], %[b], %[c]\n\t"
        : [c] "=&r" (c)
        : [a] "r" (a), [b] "r" (b)
        : "cc"
    );
    std::cout << c << "\n";
}

As to answer your question, have you tried using a lea instruction? — fuz, Nov 16 '21 at 21:08
@fuz No. That's a good point, because it explains why Intel and AMD would not bother to implement VEX for the `add` instruction. One suspects that future visitors would find your observation helpful; so, when you have some time, would you care to add it as a proper *answer?* — thb, Nov 17 '21 at 22:08
I think you have some sort of misunderstanding here. The VEX prefix is not applied to existing instructions outside of AVX/AVX2. All scalar instructions that take a VEX prefix are entirely new with new opcodes. The usual add instructions (opcodes `00` to `03`) cannot even be VEX encoded because the VEX encoding has an implicit `0f`, `0f 38`, or `0f 3a` prefix which these instructions lack. You cannot just take any random instruction and apply a VEX prefix to it. That's not how it works. — fuz, Nov 18 '21 at 00:01

score 8 · Accepted Answer · edited Nov 17 '21 at 05:12

Only a few specific GPR instructions have VEX encodings, primarily the BMI1/BMI2 instructions that were added after AVX already existed. See the list in Table 2-28, which has ANDN, BEXTR, BLSI, BLSMSK, BLSR, BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX, as well as the same list in 5.1.16.1. For example, andn's manual entry lists only a VEX encoding, and's manual entry doesn't list any.

So Intel (unfortunately) didn't introduce a brand new three-operand alternate encoding for the entire instruction set. They just introduced a few specific instructions that take three operands and use VEX for it. In some cases these have similar or equivalent functionality to an existing instruction, e.g. SHLX for SHL with a variable count, and so effectively provide a three-operand version of the previous two-operand instruction, but only in those special cases. There are not equivalent instructions across the board.

The "old style" two-operand form remains the only version of the add instruction. However, as fuz points out in comments, lea can be a good way to add two registers and write the result to a third, subject to some restrictions on operand size.

See Using LEA on values that aren't addresses / pointers? for more general things LEA can do, like copy-and-add a constant to a register, or shift-and-add. Compilers already know this and will use lea where appropriate, any time it saves instructions. (Or with some tune options like -mtune=atom for old in-order Atom, will use lea even when they could have used add.)

If more flexible encodings of common integer instructions other than add existed, like and/xor/sub, gcc -O3 -march=skylake would already be using them in its own asm output, without needing inline asm. Or if alternative instructions could get the job done, like lea for add, would be doing that, so it makes sense to look at compiler output to see what tricks it knows. Trying it yourself would make more sense as something to play around with in a stand-alone .s file that just makes an exit system call, or just to single-step, removing the complexity of using inline asm. (GAS by default doesn't restrict instruction-sets. gcc -march=skylake doesn't pass that on to the assembler, as.)

In your inline asm, your c operand should be to output-only: =r instead of +r. The old value is overwritten, so there's no need to tell the compiler to produce it as an input. (Like you said, you want c = a+b not c += a+b.)

Using a single lea as the asm template means you don't need a =&r early-clobber output, because your asm will read all its inputs before writing that output. In your case, having it as an input/output was probably stopping the compiler from choosing the same register as one of the inputs, which could have broken with mov; add.

Aha. You have just saved me a lot of time. I appreciate it. Makes sense, now. — thb, Nov 16 '21 at 20:31
@thb: I made some mostly minor improvements to this answer. The main new point is about your inline asm: even your original `mov;add` version should have been using `"=&r"`, but I wonder if your `"+r"` was a hack to make it happen to work because you didn't know about early-clobber outputs. Anyway, normally I'd just mess around with asm in a `.s` file so the extra complexity of inline asm is separate. Leave code-gen for simple stuff like `a+b` to the compiler, so it can do constant-propagation or use a memory source operand instead of separate loads or whatever. — Peter Cordes, Nov 17 '21 at 05:16
@PeterCordes Your improvements are accepted. Admittedly, I seldom write assembly except as an exercise while trying to understand some feature of the processor, so my technique is undoubtedly clumsy. You are right about the "=&r", of course. I'll edit the question accordingly. — thb, Nov 17 '21 at 22:04

In assembly, how to add integers without destroying either operand?

1 Answers1