57

I stumbled into this code for swapping two integers without using a temporary variable or the use of bitwise operators.

int main(){

    int a=2,b=3;
    printf("a=%d,b=%d",a,b);
    a=(a+b)-(b=a);
    printf("\na=%d,b=%d",a,b);
    return 0;
}

But I think this code has undefined behavior in the swap statement a = (a+b) - (b=a); as it does not contain any sequence points to determine the order of evaluation.

My question is: Is this an acceptable solution to swap two integers?

ashfaque
  • 522
  • 4
  • 8
  • 26
    In C++ just use std::swap – user1781290 Dec 27 '13 at 12:27
  • 46
    Why write code that is difficult to read for the sake of a few bytes? – Ed Heal Dec 27 '13 at 12:27
  • 12
    @EdHeal Byte by Byte the Megabyte is made (Couldn't resist) – Suvarna Pattayil Dec 27 '13 at 12:31
  • I propose to name this kind of coding "TIMTOWTBI" ? (B=Break) – Olivier Dulac Dec 27 '13 at 13:31
  • 7
    @EdHeal "sake of a few bytes" sounds like the movie cliche "sake of a few lives". If you put it in that perspective, you sound like a monster. |=^] – But I'm Not A Wrapper Class Dec 27 '13 at 16:50
  • 12
    Bear in mind that the saved "lines of code" and "variables" don't accelerate your program's performance any...on the contrary, they probably harm it. I'm willing to bet if you look at the linked ASM it's probably takes more instructions and more cycles to execute than a naive solution. – Nick T Dec 27 '13 at 16:50
  • 2
    20+ up votes on a first question. Maybe not a record, but impressive. – chux - Reinstate Monica Dec 27 '13 at 18:35
  • 1
    @MohammadS. I'd say a life is closer in equivalence to a megabyte, though. Maybe a gigabyte. – JAB Dec 27 '13 at 18:40
  • 2
    C and C++ have a fundamental difference: C++ generally strives to preserve lvalue-ness of expressions while C generally discards it immediately. This difference has a massive impact on sequencing in C and C++, especially when it comes to assignment operator. A proper answer should treat C and C++ separately. Or, maybe, the author of the question should decide which language he wants to focus on in this question. All to often question authors gratuitously slap both [C] and [C++] tags on a question assuming that the answer is "the same" for both. It is not. – AnT stands with Russia Dec 27 '13 at 18:56
  • 1
    In this case the answer is indeed "the same" in a sense that the behavior is undefined in both languages. But the underlying mechanics is still different. – AnT stands with Russia Dec 27 '13 at 18:57
  • @MohammadSHow are you compare a human life with a few extra byes of memory (that memory is temporary)? – Ed Heal Dec 27 '13 at 19:41
  • 1
    @chux; It is now 36 up votes and 2750+ views! – haccks Dec 27 '13 at 20:55
  • This is one reason I miss the feature in the CLU language which let you have multivalued expressions. You could write things like "x,y=cartesian(r,theta);"... and, more significantly for this question, you could write "x,y=y,x;" and have the right thing happen. It's actually not a difficult thing for a compiler or interpreter to implement, so I've wondered why it didn't catch on. – keshlam Dec 27 '13 at 22:04
  • 2
    I don't agree that this is a dupe. This question is more general (allowing answers such as code clarity and overflow as well) – Martin Smith Dec 28 '13 at 10:58
  • 1
    @haccks :yeah sure.I am still learning about how things work around here.This being my first question, i hope you guys understand.I never expected these brilliant kinda answers from guys with wealth of experience like you all have. I am just a nobody, a novice struggling to unlearn what i was taught as a part of my college curriculum and relearn C programming again. I very much appreciate you guys helping me out. – ashfaque Dec 30 '13 at 12:17
  • 1
    @ashfaque; You are lucky. On first day you got 36 votes and more that 3000 views with great responses from users. Generally on first question people get downvotes :). Best of luck. – haccks Dec 30 '13 at 12:24
  • 1
    @haccks :I still don't get why this question has been marked as a duplicate. Therein the main question asked is "Why can the operation be undefined?" .Mine query was different as i already knew about sequence points warning but needed to know if anyhow this was acceptable in real world programming( as it does the job of swapping the integers) – ashfaque Dec 30 '13 at 12:32
  • 1
    @ashfaque; Different users have different views. After the answer given by [remyabel](http://stackoverflow.com/a/20805797/2455888) users thought this is a dupe. I voted to reopen :). – haccks Dec 30 '13 at 12:59
  • 1
    I do not think that this question is exact dupe of the marked one. OP already knows about the order of evaluation and sequence point of the expression unlike the marked one in which OP is confused about the order of evaluation. Voted for reopen. – haccks Dec 30 '13 at 15:22
  • [This is related question](http://stackoverflow.com/questions/18040932/a-a-b-b-a-c-vs-php) – ST3 Mar 08 '14 at 19:28

10 Answers10

82

No. This is not acceptable. This code invokes Undefined behavior. This is because of the operation on b is not defined. In the expression

a=(a+b)-(b=a);  

it is not certain whether b gets modified first or its value gets used in the expression (a+b) because of the lack of the sequence point.
See what standard syas:

C11: 6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)1.

Read C-faq- 3.8 and this answer for more detailed explanation of sequence point and undefined behavior.


1. Emphasis is mine.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • @LightnessRacesinOrbit; What should be proved to explain this? OP was not sure whether it it acceptable or not and I clarified that. – haccks Dec 27 '13 at 16:00
  • 7
    That what you say is true. At the moment it is just an assertion with no proof. Typically we quote the standard for such things. – Lightness Races in Orbit Dec 27 '13 at 16:06
  • This is _unspecified_ behavior. – mafso Dec 27 '13 at 16:10
  • 2
    @mafso; NO. It is undefined behavior. Wait. I am going to quote standard. – haccks Dec 27 '13 at 16:10
  • @haccks: I can't find it in the normative part, but Annex J.1 says so... thanks in advance for looking it up. – mafso Dec 27 '13 at 16:18
  • Thanks for the edit. I see my mistake. Missed the "using the value of the _same_ scalar object" part. – mafso Dec 27 '13 at 16:20
  • @LightnessRacesinOrbit; Edited the answer now. Is it fine now? :) – haccks Dec 27 '13 at 16:23
  • In simple terms, the outcome depends on the compiler? – Jack M Dec 27 '13 at 17:15
  • @JackM; YES. In case of UB, anything could be happen. The outcome may be expected or unexpected, program may crash, you may get different result on different compilers even result may vary on different versions of the same compiler. – haccks Dec 27 '13 at 17:23
  • 4
    @JackM: The compiler, the date and time, the ambient room temperature, the bits that happen to lie in data segment 0xFC0143FC at program start, the name of your pet dog.. – Lightness Races in Orbit Dec 27 '13 at 18:36
  • 5
    One problem with this answer is that the standard quote uses the new concept of *sequencing*, while the question and the answer talks about the old concept of *sequence points*. *Sequencing* supersedes *sequence points*. And *sequencing* is quite different from *sequence points*. Actually, some formerly undefined expressions became perfectly defined after the switch to the new concept, especially the ones that use assignment operator in C++. It is true that this expression is undefined under either, but still the mix might be confusing. – AnT stands with Russia Dec 27 '13 at 18:51
  • @AndreyT; I know sequencing and sequence points are different. But if someone read the provided link in the answer then it would be crystal clear and he will not be confused then. – haccks Dec 27 '13 at 19:02
48

My Question is - Is this an acceptable solution to swap two integers?

Acceptable to who? If you're asking if it is acceptable to me, that would not get past any code review I was in, believe me.

why is a=(a+b)-(b=a) a bad choice for swapping two integers?

For the following reasons:

1) As you note, there is no guarantee in C that it actually does that. It could do anything.

2) Suppose for the sake of argument that it really does swap two integers, as it does in C#. (C# guarantees that side effects happen left-to-right.) The code would still be unacceptable because it is completely not obvious what its meaning is! Code shouldn't be a bunch of clever tricks. Write code for the person coming after you who has to read and understand it.

3) Again, suppose that it works. The code is still unacceptable because this is just plain false:

I stumbled into this code for swapping two integers without using a Temporary variable or the use of bit-wise operators.

That's simply false. This trick uses a temporary variable to store the computation of a+b. The variable is generated by the compiler on your behalf and not given a name, but it's there. If the goal is to eliminate temporaries, this makes it worse, not better! And why would you want to eliminate temporaries in the first place? They're cheap!

4) This only works for integers. Lots of things need to be swapped other than integers.

In short, spend your time concentrating on writing code that is obviously correct, rather than trying to come up with clever tricks that actually make things worse.

haccks
  • 104,019
  • 25
  • 176
  • 264
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
32

There are at least two problems with a=(a+b)-(b=a).

One you mention yourself: the lack of sequence points means that the behavior is undefined. As such, anything at all could happen. For example, there is no guarantee of which is evaluated first: a+b or b=a. The compiler may choose to generate code for the assignment first, or do something completely different.

Another problem is the fact that the overflow of signed arithmetic is undefined behavior. If a+b overflows there is no guarantee of the results; even an exception might be thrown.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • The lack of sequence points means this is UB. There is no reason to expect just one of two possible outcomes. The result might, e.g., depend on the value previously in some scratch register. – Christopher Creutzig Dec 27 '13 at 18:00
  • 5
    @mafso: False. And wrong quote. The quote you provided applies to *integral arithmetic conversions* only, not to signed arithmetic in general. That's how it has always been with *conversions*. Meanwhile, overflow during *signed integer arithmetic* triggers undefined behavior. C11 did not make any changes in that regard. The answer above is absolutely correct. 6.5/5 "If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined." – AnT stands with Russia Dec 27 '13 at 19:01
  • 1
    @Joni: Overflow during signed integer arithmetic causes undefined behavior in both C and C++. Both languages agree on that. The comment made by mafso is incorrect. The quote in that comment is taken from an irrelevant part of the language spec. – AnT stands with Russia Dec 27 '13 at 19:06
  • You were right… it's UB, not implementation-defined at least in C. Sorry. – mafso Dec 27 '13 at 19:11
  • Thank you for that @AndreyT, I suspected that would be the case after seeing *An example of undefined behavior is the behavior on integer overflow* in C11 3.4.3 but couldn't find the correct place to quote – Joni Dec 27 '13 at 19:14
29

Apart from the other answers, about undefined behavior and style, if you write simple code that just uses a temporary variable the compiler can likely trace the values and not actually swap them in the generated code, and just use the swapped values later on in some cases. It cant do that with your code. The compiler is usually better than you at micro optimizations.

So it's likely your code is slower, harder to understand, and probably unreliable undefined behavior too.

jcoder
  • 29,554
  • 19
  • 87
  • 130
  • 10
    This is the answer I think. Modern compilers are very smart, and some architectures have speciallized routes in its pipeline to do swaps, because a swap its a very common operation. So the answer is: Just use `std::swap()` and let the compiler decide whats more efficient. – Manu343726 Dec 27 '13 at 12:36
  • I should have added that, @Manu343726 good point. If you use std::swap then *if* using that trick is faster the compiler writer will hopefully have specialized std::swap to do that trick anyway. – jcoder Dec 27 '13 at 13:05
28

If you use gcc and -Wall the compiler already warns you

a.c:3:26: warning: operation on ‘b’ may be undefined [-Wsequence-point]

Whether to use such a construct is debatable from an performance point as well. When you look at

void swap1(int *a, int *b)
{
    *a = (*a + *b) - (*b = *a);
}

void swap2(int *a, int *b)
{
    int t = *a;
    *a = *b;
    *b = t;
}

and examine the assembly code

swap1:
.LFB0:
    .cfi_startproc
    movl    (%rdi), %edx
    movl    (%rsi), %eax
    movl    %edx, (%rsi)
    movl    %eax, (%rdi)
    ret
    .cfi_endproc

swap2:
.LFB1:
    .cfi_startproc
    movl    (%rdi), %eax
    movl    (%rsi), %edx
    movl    %edx, (%rdi)
    movl    %eax, (%rsi)
    ret
    .cfi_endproc

you can see no benefit for obfuscating the code.


Looking at C++ (g++) code, which does basically the same, but takes move into account

#include <algorithm>

void swap3(int *a, int *b)
{
    std::swap(*a, *b);
}

gives identical assembly output

_Z5swap3PiS_:
.LFB417:
    .cfi_startproc
    movl    (%rdi), %eax
    movl    (%rsi), %edx
    movl    %edx, (%rdi)
    movl    %eax, (%rsi)
    ret
    .cfi_endproc

Taking gcc's warning into account and seeing no technical gain, I would say, stick with standard techniques. If this ever becomes a bottleneck, you can still investigate, how to improve or avoid this small piece of code.

Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198
8

The statement:

a=(a+b)-(b=a);

invokes undefined behavior. The second shall in the quoted paragraph is violated:

(C99, 6.5p2) "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored."

ouah
  • 142,963
  • 15
  • 272
  • 331
8

A question was posted back in 2010 with the exact same example.

a = (a+b) - (b=a);

Steve Jessop warns against it:

Behaviour of that code is undefined, by the way. Both a and b are read and written without an intervening sequence point. For starters, the compiler would be well within its rights to evaluate b=a before evaluating a+b.

Here's an explanation from a question posted in 2012. Note that the sample is not exactly the same because of the lack of parentheses, but the answer is still relevant nonetheless.

In C++, subexpressions in arithmetic expressions do not have temporal ordering.

a = x + y;

Is x evaluated first, or y? The compiler can choose either, or it can choose something completely different. The order of evaluation is not the same thing as operator precedence: operator precedence is strictly defined, and order of evaluation is only defined to the granularity that your program has sequence points.

In fact, on some architectures it is possible to emit code that evaluates both x and y at the same time -- for example, VLIW architectures.

Now for C11 standard quotes from N1570:

Annex J.1/1

It is unspecified behavior when:

— The order in which subexpressions are evaluated and the order in which side effects take place, except as specified for the function-call (), &&, ||, ? :, and comma operators (6.5).

— The order in which the operands of an assignment operator are evaluated (6.5.16).

Annex J.2/1

It is undefined behavior when:

— A side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object (6.5).

6.5/1

An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.

6.5/2

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)

6.5/3

The grouping of operators and operands is indicated by the syntax.85) Except as specified later, side effects and value computations of subexpressions are unsequenced.86)

You shouldn't rely on undefined behavior.

Some alternatives: In C++ you can use

  std::swap(a, b);

XOR-swap:

  a = a^b;
  b = a^b;
  a = a^b;
Community
  • 1
  • 1
5

You can use XOR swap algorithm to prevent any overflow issues and still have a one-liner.

But, since you've got a c++ tag, I'd prefer just a simple std::swap(a, b) to make it more easily readable.

haccks
  • 104,019
  • 25
  • 176
  • 264
dreamzor
  • 5,795
  • 4
  • 41
  • 61
  • 12
    This does not answer the question. The question specifically asks whether the code shown is acceptable, not whether something else is better. – Eric Postpischil Dec 27 '13 at 12:28
  • 3
    Well, it kinda does answer the 'why is a=(a+b)-(b=a) a bad choice for swapping two integers?' question, considering the tags... – dreamzor Dec 27 '13 at 12:30
  • 2
    How does it kind of answer the question? I, for one, cannot see it. – juanchopanza Dec 27 '13 at 12:32
  • ashfaque wishes to swap integers without an additional swap variable. Other posts note sequence problems and overflow problems with proposed method. XOR swap works without either of these problems. I think it does answer the question. – Robert Jacobs Dec 27 '13 at 14:02
  • 1
    @RobertJacobs from the original question: "My Question is - Is this an acceptable solution to swap two integers?". – juanchopanza Dec 27 '13 at 14:06
  • 1
    Except that the algorithm presented in the Wikipedia doesn't work in the general case---you can't use it to implement a function, for example, which is called generically (and maybe with both references to the same variable). – James Kanze Dec 27 '13 at 14:21
  • 1
    I would not use it, as its behavior cannot be guaranteed from language to language, compiler to compiler, or even compiler release to release. It might happen to work in some combination of language, compiler vendor, and release; and then break on the next release. – Phil Perry Dec 27 '13 at 17:17
5

The problem is that according to the C++ Standard

Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.

So this expression

a=(a+b)-(b=a);

has undefined behavioir.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
4

Besides the problems invoked by the other users, this type of swap has another problem: the domain.

What if a+b goes beyond it's limit? Let's say we work with number from 0 to 100. 80+ 60 would go out of range.

haccks
  • 104,019
  • 25
  • 176
  • 264
catalinux
  • 1,462
  • 14
  • 26