3

When I read this question I remembered someone once telling me (many years ago) that from an assembler-point-of-view, these two operations are very different:

n = 0;

n = n - n;

Is this true, and if it is, why is it so?

EDIT: As pointed out by some replies, I guess this would be fairly easy for a compiler to optimize into the same thing. But what I find interesting is why they would differ if the compiler had a completely general approach.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
sharkin
  • 12,162
  • 24
  • 86
  • 122
  • Because they are not the same? – Johan Kotlinski May 15 '09 at 09:33
  • That's what the question is about. The person telling me this said that "under the hood" they would produce different machine code and that one was faster than the other. Unfortunately I don't recall the complete argument. – sharkin May 15 '09 at 09:37

9 Answers9

11

Writing assembler code you often used:

xor eax, eax

instead of

mov eax, 0

That is because with the first statement you have only the opcode and no involved argument. Your CPU will do that in 1 cylce (instead of 2). I think your case is something similar (although using sub).

dfa
  • 114,442
  • 31
  • 189
  • 228
tanascius
  • 53,078
  • 22
  • 114
  • 136
  • 1
    Yes, you could say sub eax,eax. The only difference is the flags that get set by the operation. –  May 15 '09 at 09:52
  • You can't really be that sure about *cycles*. The reason is not really cycles, directly. xor eax,eax produces a shorter (3 bytes: 6631C0) instruction than mov eax,0 (6 bytes: 66B800000000) on x86 architecture. sub eax,eax also produces a 3 byte instruction. While for current processors there's not much difference between a sub and xor, xor requires a much simpler circuit and has potential to be faster – Mehrdad Afshari May 15 '09 at 10:08
  • absolutely correct, this is all about implicit mnemonic parameters and thus reduced instruction size. – none May 22 '09 at 18:15
  • Some architectures even have a special register whos value is always 0.(MIPS at least) – Will Apr 25 '12 at 21:28
7

Compiler VC++ 6.0, without optimisations:

4:        n = 0;
0040102F   mov         dword ptr [ebp-4],0
5:
6:        n = n - n;
00401036   mov         eax,dword ptr [ebp-4]
00401039   sub         eax,dword ptr [ebp-4]
0040103C   mov         dword ptr [ebp-4],eax
6

An optimizing compiler will produce the same assembly code for the two.

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • If *n* is of a non-volatile integer type, most likely yes, but not if *n* is volatile (as mouviciel points out), and/or if it's of a floating-point type. With floats, *n-n* does not always equal 0.0, due to NaN and INF. – Max Barraclough May 05 '20 at 22:12
6

In the early days, memory and CPU cycles were scarce. That lead to a lot of so called "peep-hole optimizations". Let's look at the code:

    move.l #0,d0

    moveq.l #0,d0

    sub.l a0,a0

The first instruction would need two bytes for the op-code and then four bytes for the value (0). That meant four bytes wasted plus you'd need to access the memory twice (once for the opcode and once for the data). Sloooow.

moveq.l was better since it would merge the data into the op-code but it only allowed to write values between 0 and 7 into a register. And you were limited to data registers only, there was no quick way to clear an address register. You'd have to clear a data register and then load the data register into an address register (two op-codes. Bad.).

Which lead to the last operation which works on any register, need only two bytes, a single memory read. Translated into C, you'd get

n = n - n;

which would work for most often used types of n (integer or pointer).

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Are you saying that the n = n-n variant actually is/was more efficient than n = 0? – sharkin May 15 '09 at 09:46
  • That will usually be the case if the number is already in a register – stephan May 15 '09 at 10:02
  • Amazing. This is exactly the kind of answer I hoped to get. – sharkin May 15 '09 at 10:21
  • 2
    @R.A.: Yes, n-n is more efficient on M68000 CPUs for address registers. Moveq.l is faster for data registers since the m68k had only a 16bit ALU but sub.l is more general. Both need 16bit of memory. Funnily, clr.l (set register to 0) is slower than moveq.l ;) – Aaron Digulla May 15 '09 at 11:55
5

It may depend on whether n is declared as volatile or not.

mouviciel
  • 66,855
  • 13
  • 106
  • 140
  • True, but I can't think of a real-life case where one will make n volatile and then do n = n - n – Eli Bendersky May 15 '09 at 09:42
  • 1
    Sure, but I can't think of a real-life case where one will do n=n-n in the first place. – mouviciel May 15 '09 at 09:44
  • Thanks for the reply, but using "volatile" is also very "real-life" to me at least. This is just a theoretical/hypothetical question for educational purposes. – sharkin May 15 '09 at 09:49
4

The assembly-language technique of zeroing a register by subtracting it from itself or XORing it with itself is an interesting one, but it doesn't really translate to C.

Any optimising C compiler will use this technique if it makes sense, and trying to write it out explicitly is unlikely to achieve anything.

Artelius
  • 48,337
  • 13
  • 89
  • 105
3

In C they only differ (for integer types) if your compiler sucks (or you disabled optimization like an MSVC answer shows).

Perhaps the person who told you this way trying to describe an asm instruction like sub reg,reg using C syntax, not talking about how such a statement would actually compile with a modern optimizing compiler? In which case I wouldn't say "very different" for most x86 CPUs; most do special case sub same,same as a zeroing idiom, like xor same,same. What is the best way to set a register to zero in x86 assembly: xor, mov or and?

That makes an asm sub reg,reg similar to mov reg,0, with somewhat better code size. (But yes, some unique benefits wrt. partial-register renaming on Intel P6-family that you can only get from zeroing idioms, not mov).


They could differ in C if your compiler is trying to implement the mostly-deprecated memory_order_consume semantics from <stdatomic.h> on a weakly-ordered ISA like ARM or PowerPC, where n=0 breaks the dependency on the old value but n = n-n; still "carries a dependency", so a load like array[n] will be dependency-ordered after n = atomic_load_explicit(&shared_var, memory_order_consume). See Memory order consume usage in C11 for more details

In practice compilers gave up on trying to get that dependency-tracking right and promote consume loads to acquire. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html and When should you not use [[carries_dependency]]?

But in asm for weakly-ordered ISAs, sub dst, same, same is required to stil carry a dependency on the input register, just like in C. (Most weakly-ordered ISAs are RISCs with fixed-width instructions so avoiding an immediate operand doesn't make the machine code any smaller. Thus there is no historical use of shorter zeroing idioms like sub r1, r1, r1 even on ISAs like ARM that don't have an architectural zero register. mov r1, #0 is the same size and at least as efficient as any other way. On MIPS you'd just move $v0, $zero)

So yes, for those non-x86 ISAs, they are very different in asm. n=0 avoids any false dependency on the old value of the variable (register), while n=n-n can't execute until the old value of n is ready.


Only x86 special-cases sub same,same and xor same,same as a dependency-breaking zeroing idiom like mov eax, imm32, because mov eax, 0 is 5 bytes but xor eax,eax is only 2. So there was a long history of using this peephole optimization before out-of-order execution CPUs, and such CPUs needed to run existing code efficiently. What is the best way to set a register to zero in x86 assembly: xor, mov or and? explains the details.

Unless you're writing by hand in x86 asm, write 0 like a normal person instead of n-n or n^n, and let the compiler use xor-zeroing as a peephole optimization.

Asm for other ISAs might have other peepholes, e.g. another answer mentions m68k. But again, if you're writing in C this is the compiler's job. Write 0 when you mean 0. Trying to "hand hold" the compiler into using an asm peephole is very unlikely to work with optimization disabled, and with optimization enabled the compiler will efficiently zero a register if it needs to.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

not sure about assembly and such, but generally,

n=0
n=n-n

isnt always equal if n is floating point, see here http://www.codinghorror.com/blog/archives/001266.html

Sujoy
  • 8,041
  • 3
  • 30
  • 36
0

Here are some corner cases where the behavior is different for n = 0 and n = n - n:

  • if n has a floating point type, the result will differ from 0 for specific values: -0.0, Infinity, -Infinity, NaN...

  • if n is defined as volatile: the first expression will generate a single store into the corresponding memory location, while the second expression will generate two loads and a store, furthermore if n is the location of a hardware register, the 2 loads might yield different values, causing the write to store a non 0 value.

  • if optimisations are disabled, the compiler might generate different code for these 2 expressions even for plain int n, which might or might not execute at the speed.

chqrlie
  • 131,814
  • 10
  • 121
  • 189