10

I have just started to learn C, and I get that

*a = *b;
a++;
b++;

and

*a++ = *b++

are equivalent, but is that what's actually happening when the line

*a++ = *b++

is called? Can someone clarify how the compiler is interpreting the second line? I know about right-to-left precedence and such, but can someone precisely write the steps the compiler uses to interpret this line of code?

pyrrhic
  • 1,769
  • 2
  • 15
  • 27
  • 1
    The compiler is free to do whatever it likes so long as the program behaves correctly, so there is not much anyone can say about how this will be translated in general. All you can do is compile it with whatever compiler you use and look at the generated code - this will just tell you what your particular compiler does in this particular case. – Paul R Jul 29 '13 at 14:38
  • 1
    Try to observe and understand a asm code generated by `gcc -S`. – Grijesh Chauhan Jul 29 '13 at 14:39
  • In `*a++ = *b++;` , `++` are postfix So first `*b` assigned to `*a` then `b++` and `a++` performed. – Grijesh Chauhan Jul 29 '13 at 14:40
  • 2
    You are making the common beginner mistake of confusing *precedence* with *order of side effects*. They actually have very little to do with each other. When you say `A() + B() * C()` in C, there is no requirement that `B()` and `C()` be called before `A()` just because `*` is higher precedence than `+`. The functions can be called in *any order*, so long as `B()` and `C()` are called before `*`, `A()` is called before `+`, and `*` is called before `+`. The compiler can choose *any* order of calls that satisfies those constraints. – Eric Lippert Jul 29 '13 at 22:19
  • 2
    @GrijeshChauhan: You are making the common beginner mistake of assuming that the postfix operation has a defined point at which the increments happen. It does not. A conforming compiler is allowed to do the increments at any time. If you do not understand why that is, read my answer carefully. – Eric Lippert Jul 29 '13 at 22:27

3 Answers3

15

You said that you believe that:

*a = *b; a++; b++;

is equivalent to

*a++ = *b++;

but that is false, so you have a false belief. Let's correct your false belief.

In the first case, the following things must happen:

  • VAR: *a must be evaluated to produce a variable, call it var
  • VAL: *b must be evaluated to produce a value, call it val
  • ASSIGN: val must be assigned to var.
  • INCA: a must be incremented.
  • INCB: b must be incremented.

What are the constraints on how the compiler may order those?

  • VAR and VAL must happen before ASSIGN.
  • ASSIGN must happen before INCA.
  • INCA must happen before INCB.

The rule here is that all the side effects of one statement have to be complete before the next statement starts. So there are two legal orderings. VAR VAL ASSIGN INCA INCB, or VAL VAR ASSIGN INCA INCB.

Now let's consider the second case.

*a++ = *b++;

We have the same five operations, but the constraints on their ordering are completely different because these are all in the same statement, so the rule about statements does not apply. Now the constraints are:

  • VAR and VAL must happen before ASSIGN.
  • the evaluation of VAR must use the original value of a
  • the evaluation of VAL must use the original value of b

Note that I did not say that the increments are required to happen afterwards. Rather, I said that the original values must be used. As long as the original value is used, the increment can happen at any time.

So for example, it would be perfectly legal to generate this as

var = a;
a = a + 1; // increment a before assign
*var = *b;
b = b + 1; // increment b after assign

It would also be legal to do this:

val = *b;
b = b + 1; // increment b before assign
*a = val;
a = a + 1; // increment a after assign

It would also be legal to do it as you suggest: do the assignment first, and then both increments in left-to right order. And it would also be legal to do the assignment first, and then both increments in right-to-left order.

A C compiler is given broad latitude to generate code however it likes for this kind of expression. Make sure this is very clear in your mind, because most people get this wrong: just because the ++ comes after the variable does not mean that the increment happens late. The increment can happen as early as the compiler likes as long as the compiler ensures that the original value is used.

That's the rule for C and C++. In C#, the language specification requires that the side effects of the left side of an assignment happen before the side effects of the right side of an assignment, and that both happen before the side effect of the assignment. That same code in C# would be required to be generated as:

var_a = a;
a = a + 1;
// must pointer check var_a here
var_b = b;
b = b + 1;
val = *var_b; // pointer checks var_b
*var_a = val;

The "pointer check" is the point at which C# requires that the runtime verify that var_a is a valid pointer; in other words, that *var_a is actually a variable. If it is not then it must throw an exception before b is evaluated.

Again, a C compiler is permitted to do it the C# way, but not required to.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 2
    So the expression `a = *p++;` performed in 2 ways **(1)** first `p assigned in some VAR then because `++` has higher precedence so `p` updated to points to next location, and then in *second step* old value of `p` = VAR assigned to `a` as `a = *VAR`. **(2)** first `*p` assign to `a` because `++` is postfix operator then `p` updates to point next location. – Grijesh Chauhan Jul 30 '13 at 07:02
  • 2
    @GrijeshChauhan: Correct; either ordering is legal in C. In C# the specification requires that first, the side effects of the left hand side, if any, are produced. Then side effect of the increment of `p` happens, then the side effect of dereferencing the old value of `p` happens (remember, dereferencing produces a side effect in C# if the pointer is invalid), and then the side effect of the assignment happens. That is, in C# side effects happen *left to right* for subexpressions, and *in precedence order* for operators. – Eric Lippert Jul 30 '13 at 14:03
  • 1
    Never knew before reading this answer that the expression `a = *P++` can be evaluated like `var = p;` `p = p + 1;` `a = *var`! – haccks Jul 31 '13 at 21:49
  • @haccks: same to you. I also believed that despite the naming, this doesn't mean that pre-increment will first write to memory and then return, and it doesn't mean that post-increment will first return and then write to memory. But this misconception has been removed now. – Destructor Jan 30 '16 at 06:42
  • @EricLippert: var=a; is wrong I think in code generation in second case. It should be var=*a; Correct me If I am wrong. – Destructor Jan 30 '16 at 07:02
4

1)

*a = *b;
a++;
b++;

is equivalent to

*a = *b;
a = a+1;
b = b+1

2)

x = *a++

is equivalent to

x = *a;
a = a+1;

and

*b++ = x

is equivalent to

*b = x;
b = b+1;

so

*a++ = *b++

is equivalent to

*a = *b;
a = a+1;
b = b+1

3)

*(++a) = *(++b)

is equivalent to

a = a+1;
b = b+1
*a = *b;
MOHAMED
  • 41,599
  • 58
  • 163
  • 268
  • Is it in fact true that `x=*a++` is *required* to be equivalent to `x=*a;a=a+1;`? Could a conforming compiler implement it as `temp=a; a=a+1; x=*temp;` ? – Eric Lippert Jul 29 '13 at 21:43
3

The exact sequence in which the expressions are evaluated and side effects applied is left unspecified; all that is guaranteed is that the result of *b++ (the value that b currently points to) is assigned to the result of *a++ (the value that a currently points to), and that both pointers are advanced. The exact order of operations will vary.

If you want to know how your platform handles it, you can look at the generated machine code, but be aware that it can still vary depending on compiler settings or surrounding code.

John Bode
  • 119,563
  • 19
  • 122
  • 198