4

I know that variations of this question has been asked here multiple times, but I'm not asking what is the difference between the two. Just would like some help understanding the assembly behind both forms.

I think my question is more related to the whys than to the what of the difference.

I'm reading Prata's C Primer Plus and in the part dealing with the increment operator ++ and the difference between using i++ or ++i the author says that if the operator is used by itself, such as ego++; it doesn't matter which form we use.

If we look at the dissasembly of the following code (compiled with Xcode, Apple LLVM version 9.0.0 (clang-900.0.39.2)):

int main(void)
{
    int a = 1, b = 1;

    a++;
    ++b;

    return 0;

}

we can see that indeed the form used doesn't matter, since the assembly code is the same for both (both variables would print out a 2 to the screen).

Initializaton of a and b:

0x100000f8d <+13>: movl   $0x1, -0x8(%rbp)
0x100000f94 <+20>: movl   $0x1, -0xc(%rbp)

Assembly for a++:

0x100000f9b <+27>: movl   -0x8(%rbp), %ecx
0x100000f9e <+30>: addl   $0x1, %ecx
0x100000fa1 <+33>: movl   %ecx, -0x8(%rbp)

Assembly for ++b:

0x100000fa4 <+36>: movl   -0xc(%rbp), %ecx 
0x100000fa7 <+39>: addl   $0x1, %ecx 
0x100000faa <+42>: movl   %ecx, -0xc(%rbp)

Then the author states that when the operator and its operand are part of a larger expression as, for example, in an assignment statement the use of prefix or postfix it does make a difference.

For example:

int main(void)
{
    int a = 1, b = 1;
    int c, d;

    c = a++;
    d = ++b;

    return 0;

}

This would print 1 and 2 for c and b, respectively.

And:

Initialization of a and b:

0x100000f46 <+22>: movl   $0x1, -0x8(%rbp)
0x100000f4d <+29>: movl   $0x1, -0xc(%rbp)

Assembly for c = a++; :

0x100000f54 <+36>: movl   -0x8(%rbp), %eax      // eax = a = 1
0x100000f57 <+39>: movl   %eax, %ecx            // ecx = 1
0x100000f59 <+41>: addl   $0x1, %ecx            // ecx = 2
0x100000f5c <+44>: movl   %ecx, -0x8(%rbp)      // a = 2
0x100000f5f <+47>: movl   %eax, -0x10(%rbp)     // c = eax = 1

Assembly for d = ++b; :

0x100000f62 <+50>: movl   -0xc(%rbp), %eax      // eax = b = 1
0x100000f65 <+53>: addl   $0x1, %eax            // eax = 2
0x100000f68 <+56>: movl   %eax, -0xc(%rbp)      // b = eax = 2
0x100000f6b <+59>: movl   %eax, -0x14(%rbp)     // d = eax = 2

Clearly the assembly code is different for the assignments:

  • The form c = a++; includes the use of the registers eax and ecx. It uses ecx for performing the increment of a by 1, but uses eax for the assignment.

  • The form d = ++b; uses ecx for both the increment of b by 1 and the assignment.

My question is:

  • Why is that?
  • What determines that c = a++; requires two registers instead of just one (ecx for example)?
compor
  • 2,239
  • 1
  • 19
  • 29
asd
  • 1,017
  • 3
  • 15
  • 21
  • Which compiler are you using, as compilers can produce different machine code. – Dragonthoughts Jan 29 '18 at 09:13
  • @Dragonthoughts I'm using Xcode. Apple LLVM version 9.0.0 (clang-900.0.39.2) – asd Jan 29 '18 at 09:18
  • 4
    What optimization levels are used? Put that in question please. – sjsam Jan 29 '18 at 09:25
  • You didn't compile with optimizations ON, so your first assembly with the `rbp` clutter is non-sense, that's literal debug translation of the C source lines for human debugging, not relevant to production machine code. (to avoid optimizer removing the non-effect `++a; b++;` you can declare a, b as `volatile`, when trying out some dirty check of compiler). ... I tried to produce some example with godbolt, but it's futile, as your question is misleading you the wrong path when taken literally, and so the examples went wrong as well... hm. will check answers + add something if nobody corrected u. – Ped7g Jan 29 '18 at 11:26
  • Antonin and Lundin answers together are going to the point my answer would have, so just summary: `++i` returns in expression new value (and `i` contains new value, so both return value and `i` are equal). `i++` returns in expression old value (but `i` contains new, so you have two different values at the same time). None of that matters too much to optimizer, as it is the return value usage vs new `i` usage which would make the post-fix increment to use more machine code to juggle around with two values instead of one. And that can be still avoided many times in optimized machine code. – Ped7g Jan 29 '18 at 11:38
  • Another point is, that there is no assembly level behind ++i and i++, those are C language operators, which affect C abstract machine state. On assembly level the compiler is producing native machine code simulating that C abstract machine and the observable effects of the original C source, so there's no direct 1:1 mapping between `i++` and machine code produced, only the observable effect is translated, not the `i++` itself. And the optimizer will try hard to translate only the observable effects and get to them in the fastest possible way, so it will gladly skip many `i++` if possible. – Ped7g Jan 29 '18 at 11:41
  • @Anton Korobeynikov: the OP said in comments they're using Xcode with Apple LLVM. Un-optimized code is definitely compiler-dependent. – Peter Cordes Jan 30 '18 at 11:37
  • @PeterCorder it's both compiler and compiler-version dependent. However, the exact compiler and compiler version are highly irrelevant to the question, since comparing unoptimized code does not make much sense :) – Anton Korobeynikov Jan 31 '18 at 09:50

5 Answers5

4

In the following statements:

a++;
++b;

neither of the evaluation of the expressions a++ and ++b is used. Here the compiler is actually only interested in the side effects of these operators (i.e.: incrementing the operand by one). In this context, both operators behave in the same way. So, it's no wonder that these statements result in the same assembly code.

However, in the following statements:

c = a++;
d = ++b;

the evaluation of the expressions a++ and ++b is relevant to the compiler because they have to be stored in c and d, respectively:

  • d = ++b;: b is incremented and the result of this increment assigned to d.
  • c = a++; : the value of a is first assigned to c and then a is incremented.

Therefore, these operators behave differently in this context. So, it would make sense to result in different assembly code, at least in the beginning, without more aggressive optimizations enabled.

JFMR
  • 23,265
  • 4
  • 52
  • 76
  • 1
    The results c and d aren't used either, so there should be no machine code at all... – Lundin Jan 29 '18 at 10:01
  • 1
    @Lundin : Good note again. Though the question is more about best practices, it should be discussed in the light of optimization for it to be more meaningful. – sjsam Jan 29 '18 at 10:26
  • @sjsam The same result is absolutely not stored in `c` in both cases. Just try it. That's not how the operators work at all. – Art Jan 29 '18 at 10:49
  • 2
    @Lundin: `gcc -O0` and `clang -O0` compile each C statement separately, spilling / reloading around it, to support async modification of variables with `gdb`, and GDB's `jump` command (resume execution at a new source line). Within a single C statement, gcc especially is not totally braindead the way some other compilers are, and still does stuff like using multiplicative inverses for division by non-power-of-2 constants. Anyway, these implicit memory barriers and support for async modification / jumps are why `gcc -O0` can't optimize away the code. It's pretty clear that's what was used. – Peter Cordes Jan 29 '18 at 13:33
  • @PeterCordes We don't even know which compiler that is used, let alone which compiler options. – Lundin Jan 29 '18 at 13:36
  • 2
    @Lundin; yes we do, [from a comment](https://stackoverflow.com/questions/48497636/understanding-the-difference-between-i-and-i-at-the-assembly-level/48497971?noredirect=1#comment83988699_48497636): Xcode with Apple LLVM. It's obviously with `-O0`, because even `-Og` or `-O1` would optimize across C statements. I thought it was gcc from use of `add $1` instead of `inc` before seeing that comment, but I guess LLVM only does that peephole with optimization enabled. – Peter Cordes Jan 29 '18 at 13:37
3

A good compiler would replace this whole code with c = 1; d = 2;. And if those variables aren't used in turn, the whole program is one big NOP - there should be no machine code generated at all.

But you do get machine code, so you are not enabling the optimizer correctly. Discussing the efficiency of non-optimized C code is quite pointless.

Discussing a particular compiler's failure to optimize the code might be meaningful, if a specific compiler is mentioned. Which isn't the case here.

All this code shows is that your compiler isn't doing a good job, possibly because you didn't enable optimizations, and that's it. No other conclusions can be made. In particular, no meaningful discussion about the behavior of i++ versus ++i is possible.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • This isn't and answer and should have been a comment. The question is not related to optimizations. – SenhorLucas Mar 24 '23 at 22:02
  • @SenhorLucas The question is about how the ++ operators in C work underneath the hood, which you won't learn by analyzing non-optimized assembly. It's like analysing the best route between Los Angeles and San Francisco by taking the route via Detroit. "Why did it take so long?" Well... – Lundin Mar 27 '23 at 07:02
2

Your test has flaws : the compiler optimized your code by replacing your value with what could be easily predicted.

The compiler can, and will, calculate the result in advance during compilation and avoid the use of 'jmp' instructions (jump to the the while each time condition is still true).

If you try this code:

int a = 0;
int i = 0;

while (i++ < 10)
{
    a += i;
}

The assembly will not use a single jmp instruction.

It will directly assign value of ½ n (n + 1), here (0.5 * 10 * 6) = 30 to the register holding the value of 'a' variable

You would have the following assembly output:

mov eax, 30 ; a register
mov ecx, 10 ; i register, this line only if i is still used after.

Whether you write :

int i = 0;
while (i++ < 10)
{
    ...
}

or

int i = -1;
while (++i < 11)
{
    ...
}

will also result in the same assembly output.


If you had a much more complex code you would be able to witness differences in the assembly code.

 a = ++i;

would translate into :

inc rcx          ; increase i by 1, RCX holds the current value of both and i variables.

mov rax, rcx ; a = i;

and a = i++; into :

lea rax, [rcx+1] ; RAX now holds i, RCX now holds a.

mov rax, rcx ; a = i;

inc rcx ; increase i by 1

(edit: See comment below)

Antonin GAVREL
  • 9,682
  • 8
  • 54
  • 81
  • 2
    Assuming both variable values were actually needed in the future, and both started in *registers* instead of memory (unlike the `gcc -O0` case), your last section isn't optimal: A smart compiler would actually optimize `a = ++i` into `inc rcx` (now RCX holds the current value of both C variables). It wouldn't copy it until and unless some later operation can't be done with a copy-and-modify instruction like `lea`. e.g. `shr`. Speaking of which, `a = i++` should compile into `lea rax, [rcx+1]` (RAX now holds `i`, RCX now holds `a`). Compilers don't even try to keep variables in one register. – Peter Cordes Jan 29 '18 at 13:24
  • Anyway, `gcc -O0` compiles each statement independently, with a memory barrier, so the actual flaw in the asm the OP is looking as is that the inputs and outputs are in memory, not registers. (And that optimization is disabled, so there's no guarantee it will use the best instruction sequences to implement things, although in this case it looks ok.) The flaw you describe only happens if you compile that same C source with optimization. (Which of course you should do...) – Peter Cordes Jan 29 '18 at 13:28
1

Both the expressions ++i and i++ have the effect of incrementing i. The difference is that ++i produces a result (a value stored somewhere, for example in a machine register, that can be used within other expressions) equal to the new value of i, whereas i++ produces a result equal to the original value of i.

So, assuming we start with i having a value of 2, the statement

 b = ++i;

has the effect of setting both b and i equal to 3, whereas;

 b = i++;

has the effect of setting b equal to 2 and i equal to 3.

In the first case, there is no need to keep track of the original value of i after incrementing i whereas in the second there is. One way of doing this is for the compiler to employ an additional register for i++ compared with ++i.

This is not needed for a trivial expression like

 i++;

since the compiler can immediately detect that the original value of i will not be used (i.e. is discarded).

For simple expressions like b = i++ the compiler could - in principle at least - avoid using an additional register, by simply storing the original value of i in b before incrementing i. However, in slightly more complex expressions such as

c = i++ - *p++;       //  p is a pointer

it can be much more difficult for the compiler to eliminate the need to store old and new values of i and p (unless, of course, the compiler looks ahead and determines how (or if) c, i, and p (and *p) are being used in subsequent code). In more complex expressions (involving multiple variables and interacting operations) the analysis needed can be significant.

It then comes down to implementation choices by developers/designers of the compiler. Practically, compiler vendors compete pretty heavily on compilation time (getting compilation times as small as possible) and, in doing so, may choose not to do all possible code transformations that remove unneeded uses of temporaries (or machine registers).

Peter
  • 35,646
  • 4
  • 32
  • 74
0

You compiled with optimization disabled! For gcc and LLVM, that means each C statement is compiled independently, so you can modify variables in memory with a debugger, and even jump to a different source line. To support this, the compiler can't optimize between C statements at all, and in fact spills / reloads everything between statements.

So the major flaw in your analysis is that you're looking at an asm implementation of that statement where the inputs and outputs are memory, not registers. This is totally unrealistic: compilers keep most "hot" values in registers inside inner loops, and don't need separate copies of a value just because it's assigned to multiple C variables.

Compilers generally (and LLVM in particular, I think) transform the input program into an SSA (Static Single Assignment) internal representation. This is how they track data flow, not according to C variables. (This is why I said "hot values", not "hot variables". A loop induction variable might be totally optimized away into a pointer-increment / compare against end_pointer in a loop over arr[i++]).


c = ++i; produces one value with 2 references to it (one for c, one for i). The result can stay in a single register. If it doesn't optimize into part of some other operation, the asm implementation could be as simple as inc %ecx, with the compiler just using ecx/rcx everywhere that c or i is read before the next modification of either. If the next modification of c can't be done non-destructively (e.g. with a copy-and-modify like lea (,%rcx,4), %edx or shrx %eax, %ecx, %edx), then a mov instruction to copy the register will be emitted.

d = b++; produces one new value, and makes d a reference to the old value of b. It's syntactic sugar for d=b; b+=1;, and compiles into SSA the same as that would. x86 has a copy-and-add instruction, called lea. The compiler doesn't care which register holds which value (except in loops, especially without unrolling, when the end of the loop has to have values in the right registers to jump to the beginning of the loop). But other than that, the compiler can do lea 1(%rbx), %edx to leave %ebx unmodified and make EDX hold the incremented value.


An additional minor flaw in your test is that with optimization disabled, the compiler is trying to compile quickly, not well, so it doesn't look for all possible peephole optimizations even within the statement that it does allow itself to optimize.


If the value of c or d is never read, then it's the same as if you had never done the assignment in the first place. (In un-optimized code, every value is implicitly read by the memory barrier between statements.)


What determines that c = a++; requires two registers instead of just one (ecx for example)?

The surrounding code, as always. +1 can be optimized into other operations, e.g. done with an LEA as part of a shift and/or add. Or built in to an addressing mode.

Or before/after negation, use the 2's complement identity that -x == ~x+1, and use NOT instead of NEG. (Although often you're adding the negated value to something, so it turns into a SUB instead of NEG + ADD, so there isn't a stand-alone NEG you can turn into a NOT.)


++ prefix or postfix is too simple to look at on its own; you always have to consider where the input comes from (does the incremented value have to end up back in memory right away or eventually?) and how the incremented and original values are used.

Basically, un-optimized code is un-interesting. Look at optimized code for short functions. See Matt Godbolt's talk at CppCon2017: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”, and also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler asm output.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847