3

What is the difference between these two macros?

#define swap(a, b)    (((a) ^ (b)) && ((a) ^= (b) ^= (a) ^= (b)))

Or

#define swap(a, b)    (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b)))

I saw the second macro here but couldn't understand why it wasn't written like the first one? Is there a special reason that I missed?

haccks
  • 104,019
  • 25
  • 176
  • 264
yunusaydin
  • 347
  • 2
  • 13

3 Answers3

4

First will invoke undefined behavior in both C99 and C11.

In C99, it can be understood as; they will invoke undefined behavior because of the lack of the sequence points.

C-faq:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.

Explanation:
The first one is modifying a twice between two sequence points and hence behavior is undefined as per statement: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. That's it (no need to think about b).

C11 documentation says:

6.5 Expressions (p2):

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)

In (a) ^= (b) ^= (a) ^= (b), side effect on a is unsequenced and hence invoke undefined behavior. It should be noted that C11 6.5 p1 says that:

[...] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.

This guarantees that in

(a) ^= (b) ^= (a) ^= (b)  
 |      |      |      | 
 1      2      3      4  

all subexpressions 1, 2, 3 and 4 are guaranteed to be computed before the result computation of left most ^= operator. But, this doesn't guarantee that side effect of expression 3 is guaranteed before the value computation of the result of the left most ^= operator.


1. Emphasis is mine.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    There's a `,` in the second one. – Uchia Itachi Dec 30 '13 at 15:40
  • 2
    @UchiaItachi: Don't think that makes a difference. `b ^= a ^= b` is still UB. – Oliver Charlesworth Dec 30 '13 at 15:44
  • @OliCharlesworth Is it actually UB? In C11, it seems to me like the value computation of the right-hand `b` would be sequenced before the value computation of the result of `a ^= b`, and therefore before the assignment to `b`. So I don't see how it's UB. It might be UB in C99 though. – interjay Dec 30 '13 at 15:53
  • 1
    @interjay: It's definitely UB in C99. If C11 has added further sequencing constraints, that's fine, but until the majority of the world is using C11, it's probably safer to use C99 as the baseline... (or, at least, make this distinction clear in the answer) – Oliver Charlesworth Dec 30 '13 at 15:54
  • @OliCharlesworth; OK. Second one does not invoke undefined behavior in either of C99 or C11.`the prior value shall be accessed only to determine the value to be stored` can't be applied in case of `(b) ^= (a) ^= (b)`. – haccks Dec 30 '13 at 18:39
2

The first one invokes undefined behavior in C99 for two reasons the most obvious since you are not allowed to modify the same variable more than once within the same sequence point and that macro modifies both a and b more than once while the second one uses the comma operator:

#define swap(a, b)    (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b)))
                                                        ^

which introduces a sequence point but does not remove all undefined behavior in C99 since the prior value of b is being read to compute the value of a but can only be used to determine the value to be stored to b.

Relevant section from C99 draft standard section 6.5 Expressions paragraph 2 says (emphasis mine going forward):

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)

and for the comma operator, from section 6.5.17 Comma operator paragraph 2 says:

The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation.[...]

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • 3
    Isn't there just *one* reason; namely lack of sequence points? – Oliver Charlesworth Dec 30 '13 at 16:01
  • 1
    @OliCharlesworth the second reason is that the prior value of `a` and `b` are being used other than to determine the the value to be stored which is why adding the comma operator does not eliminate all undefined behavior. – Shafik Yaghmour Dec 30 '13 at 16:03
  • 2
    But that's the same reason; that leads to UB *because* there aren't enough sequence points in `b ^= a ^= b`. Adding the comma operator still doesn't add enough sequence points. – Oliver Charlesworth Dec 30 '13 at 16:05
  • @OliCharlesworth that is a fair point, but they are subtly different reasons as observed by the second piece of code which someone obviously thought removed all undefined behavior. – Shafik Yaghmour Dec 30 '13 at 16:09
  • @ShafikYaghmour; I think we are wrong. Second one does't invoke undefined behavior in C99 too. The statement *prior value shall be read only to determine the value to be stored.* has nothing to do with `(b) ^= (a) ^= (b)`. – haccks Dec 30 '13 at 17:27
1

To better understand why the first one is undefined, here is another way to present it:
It's because in C, you have no control over order of execution among sub-expressions:

a = a^(b=b^(a=a^b))

For the first a occurring after =, the C compiler has the choice to use initial value of a, or modified value of a. It's thus clearly ambiguous, and lead to undefined behaviour.

The second one looks OK to me, as non ambiguous:

b = b ^(a=a^b)

The fact that a and b occur in the first part of expression (a^b)&&... does not seem a problem to me, because && forces the first part to be evaluated first. But then, I prefer to let the experts dissect the standard, I'm not an expert...

aka.nice
  • 9,100
  • 1
  • 28
  • 40