In C they only differ (for integer types) if your compiler sucks (or you disabled optimization like an MSVC answer shows).
Perhaps the person who told you this way trying to describe an asm instruction like sub reg,reg
using C syntax, not talking about how such a statement would actually compile with a modern optimizing compiler? In which case I wouldn't say "very different" for most x86 CPUs; most do special case sub same,same
as a zeroing idiom, like xor same,same
. What is the best way to set a register to zero in x86 assembly: xor, mov or and?
That makes an asm sub reg,reg
similar to mov reg,0
, with somewhat better code size. (But yes, some unique benefits wrt. partial-register renaming on Intel P6-family that you can only get from zeroing idioms, not mov
).
They could differ in C if your compiler is trying to implement the mostly-deprecated memory_order_consume
semantics from <stdatomic.h>
on a weakly-ordered ISA like ARM or PowerPC, where n=0
breaks the dependency on the old value but n = n-n;
still "carries a dependency", so a load like array[n]
will be dependency-ordered after n = atomic_load_explicit(&shared_var, memory_order_consume)
. See Memory order consume usage in C11 for more details
In practice compilers gave up on trying to get that dependency-tracking right and promote consume
loads to acquire
. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html and When should you not use [[carries_dependency]]?
But in asm for weakly-ordered ISAs, sub dst, same, same
is required to stil carry a dependency on the input register, just like in C. (Most weakly-ordered ISAs are RISCs with fixed-width instructions so avoiding an immediate operand doesn't make the machine code any smaller. Thus there is no historical use of shorter zeroing idioms like sub r1, r1, r1
even on ISAs like ARM that don't have an architectural zero register. mov r1, #0
is the same size and at least as efficient as any other way. On MIPS you'd just move $v0, $zero
)
So yes, for those non-x86 ISAs, they are very different in asm. n=0
avoids any false dependency on the old value of the variable (register), while n=n-n
can't execute until the old value of n
is ready.
Only x86 special-cases sub same,same
and xor same,same
as a dependency-breaking zeroing idiom like mov eax, imm32
, because mov eax, 0
is 5 bytes but xor eax,eax
is only 2. So there was a long history of using this peephole optimization before out-of-order execution CPUs, and such CPUs needed to run existing code efficiently. What is the best way to set a register to zero in x86 assembly: xor, mov or and? explains the details.
Unless you're writing by hand in x86 asm, write 0
like a normal person instead of n-n
or n^n
, and let the compiler use xor-zeroing as a peephole optimization.
Asm for other ISAs might have other peepholes, e.g. another answer mentions m68k. But again, if you're writing in C this is the compiler's job. Write 0
when you mean 0
. Trying to "hand hold" the compiler into using an asm peephole is very unlikely to work with optimization disabled, and with optimization enabled the compiler will efficiently zero a register if it needs to.