Is subtraction less performant than negation?

Question

I wonder if it's faster for the processor to negate a number or to do a subtraction. For example:

Is

int a = -3;

more efficient than

int a = 0 - 3;

In other words, does a negation is equivalent to subtracting from 0? Or is there a special CPU instruction that negate faster that a subtraction?

I suppose that the compiler does not optimize anything.

Short answer: don't worry about microoptimization like this unless you know you have a real performance problem, **and** you've actually profiled your code and know where the bottlenecks actually are. With today's deeply-pipelined CPUs, what you think is the bottleneck quite likely isn't. One thing you do need to be careful, of, though, [is how constants like `INT_MIN` are defined.](https://stackoverflow.com/questions/26003893/why-do-we-define-int-min-as-int-max-1) — Andrew Henle, Nov 15 '19 at 20:28

Peter Cordes · Answer 1 · 2019-11-15T08:10:32.260

(This answer is about negating a runtime variable, like -x or 0-x where constant-propagation doesn't result in a compile-time constant value for x. A constant like 0-3 has no runtime cost.)

I suppose that the compiler does not optimize anything.

That's not a good assumption if you're writing in C. Both are equivalent for any non-terrible compiler because of how integers work, and it would be a missed-optimization bug if one compiled to more efficient code than the other.

If you actually want to ask about asm, then how to negate efficiently depends on the ISA.

But yes, most ISAs can negate with a single instruction, usually by subtracting from an immediate or implicit zero, or from an architectural zero register.

e.g. 32-bit ARM has an rsb (reverse-subtract) instruction that can take an immediate operand. rsb rdst, rsrc, #123 does dst = 123-src. With an immediate of zero, this is just negation.

x86 has a neg instruction: neg eax is exactly equivalent to eax = 0-eax, setting flags the same way.

3-operand architectures with a zero register (hard-wired to zero) can just do something like MIPS subu $t0, $zero, $t0 to do t0 = 0 - t0. It has no need for a special instruction because the $zero register always reads as zero. Similarly AArch64 removed RSB but has a xzr / wzr 64/32-bit zero register. (Although it also has a pseudo-instruction called neg which subtracts from the zero register).

You could see most of this by using a compiler. https://godbolt.org/z/B7N8SK But you'd have to actually compile to machine code and disassemble because gcc/clang tend to use the neg pseudo-instruction on AArch64 and RISC-V. Still, you can see ARM32's rsb r0,r0,#0 for int negate(int x){return -x;}

score 2 · Answer 2 · edited Nov 15 '19 at 18:18

Both are compile time constants, and will generate the same constant initialisation in any reasonable compiler regardless of optimisation.

For example at https://godbolt.org/z/JEMWvS the following code:

void test( void )
{
    int a = -3;
}

void test2( void )
{
    int a = 0-3;
}

Compiled with gcc 9.2 x86-64 -std=c99 -O0 generates:

test:
  push rbp
  mov rbp, rsp
  mov DWORD PTR [rbp-4], -3
  nop
  pop rbp
  ret
test2:
  push rbp
  mov rbp, rsp
  mov DWORD PTR [rbp-4], -3
  nop
  pop rbp
  ret

Using -Os, the code:

void test( void )
{
    volatile int a = -3;
}

void test2( void )
{
    volatile int a = 0-3;
}

generates:

test:
  mov DWORD PTR [rsp-4], -3
  ret
test2:
  mov DWORD PTR [rsp-4], -3
  ret

The volatile being necessary to prevent the compiler removing the unused variables.

As static data it is even simpler:

int a = -3;
int b = 0-3;

outside of a function generates no executable code, just initialised data objects (initialisation is different from assignment):

a:
  .long -3
b:
  .long -3

Assignment of the above statics:

a = -4 ;
b = 0-4 ;

is still a compiler evaluated constant:

mov DWORD PTR a[rip], -4
mov DWORD PTR b[rip], -4

The take-home here is:

If you are interested, try it and see (with your own compiler or Godbolt set for your compiler and/or architecture),
don't sweat the small stuff, let the compiler do its job,
constant expressions are evaluated at compile time and have no run-time impact,
writing weird code in the belief you can better the compiler is almost always pointless. Compilers work better with idiomatic code the optimiser can recognise.

You can return an `int` instead of storing it to a `volatile` local. Then the function-body can be a single instruction even for ABIs that don't use a red-zone below the stack pointer the way x86-64 System V does. But yes, good point that the OP's examples were constants! — Peter Cordes, Nov 15 '19 at 00:31
@PeterCordes The volatile was just a means to an end, and to keep the statements as close to those in the question as possible. The point is however that the optimiser's reduction in code is far more significant than you might expect regardless of attempts to "assist" it in the source. — Clifford, Nov 15 '19 at 07:59

score 2 · Answer 3 · answered Nov 15 '19 at 07:59

From the C language point of view, 0 - 3 is an integer constant expression and those are always calculated at compile-time.

Formal definition from C11 6.6/6:

An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts.

Knowing that these are calculated at compile time is important when writing readable code. For example if you want to declare a char array to hold 5 characters and the null terminator, you can write char str[5+1]; rather than 6, to get self-documenting code telling the reader that you have considered null termination.

Similarly when writing macros, you can make use of integer constant expressions to perform parts of the calculation at compile time.

score 1 · Answer 4 · edited Nov 15 '19 at 19:56

It's hard to tell if you ask asking if subtraction is fast then negation in a general sense, or in this specific case of implementing negation via subtraction from zero. I'll try to answer both.

General Case

For the general case, on most modern CPUs these operations are both very fast: usually each only taking a single cycle to execute, and often having a throughput of more than one per cycle (because CPUs are superscalar). On all recent AMD and Intel CPUs that I checked, both sub and neg execute at the same speed for register and immediate arguments.

Implementing -x

As regards to your specific question of implementing the -x operation, it would usually be slightly faster to implement this with a dedicated neg operation than with a sub, because with neg you don't have to prepare the zero registers. For example, a negation function int neg(int x) { return -x; }; would look something like this with the neg instruction:

neg:
  mov eax, edi
  neg eax

... while implementing it terms of subtraction would look something like:

neg:
  xor eax, eax
  sub eax, edi

Well ... sub didn't come out looking at worse there, but that's mostly a quirk of the calling convention and the fact that x86 uses a 1 argument destructive neg: the result needs to be in eax, so in the neg case 1 instruction is spent just moving the result to the right register, and one doing the negation. The sub version takes two instructions to perform the negation itself: one to zero a register, and one to do the subtraction. It so happens that this lets you avoid the ABI shuffling because you get to choose the zero register as the result register.

Still, this ABI related inefficiency wouldn't persist after inlining, so we can say in some fundamental sense that neg is slightly more efficient.

Now many ISAs may not have a neg instruction at all, so the question is more or less moot. They may have a hardcoded zero register, so you'd implement negation via subtraction from this register and there is no cost to set up the zero.

Is subtraction less performant than negation?

4 Answers4

General Case

Implementing -x