2

I was looking for alternate options to swap two numbers and came across the link How to swap two numbers

In the comments section its been mentioned that using temporary variable is better. Below is the comment I copied form the link

If we look at the problem at the CPU instructions perspective, use tmp will be better than all above 3 method, i have run a benchmark agains all those 4 method (including the 4th by ?using temp variable). without surprise, the 4th way beats all above 3 method. And the reason is how CPU move the variable into register and how many register we need to use.

But I am not able to find a clue on how it works. Can someone explain me how it works at the processor level and why temp variable is better (if it is)?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Tamil
  • 1,173
  • 1
  • 13
  • 35

4 Answers4

3

The only way to see what kind of optimisation happens at this level is to compile and disassemble. It turns out that the compiler is already very good at removing or reinterpreting your code to make it faster.

I compiled this code using the MS C compiler:

int main() 
{
        int a = 1;
        int b = 2;
        int c;

        // Force use of the variables so they aren't optimized away
        printf("a = %d, b = %d\n", a, b);

        c = b;
        b = a;
        a = c;

        // Force use again
        printf("a = %d, b = %d\n", a, b);

        return 0;
}

This is the actual output after optimisations, edited for brevity:

; 4    :    int a = 1;
; 5    :    int b = 2;
; 6    :    int c;

; OPTIMISED AWAY

; 8    :    printf("a = %d, b = %d\n", a, b);

    push    2
    push    1
    push    pointer_to_string_constant
    call    DWORD PTR __imp__printf

; 10   :    c = b;
; 11   :    b = a;
; 12   :    a = c;

; OPTIMISED AWAY

; 14   :    printf("a = %d, b = %d\n", a, b);

    push    1
    push    2
    push    pointer_to_string_constant
    call    DWORD PTR __imp__printf

; 16   :    return 0;

    xor eax, eax ; Just a faster way of saying "eax = 0;"

; 17   : }

    ret 0

So you see, the compiler decided in this case not to use any variables at all and just push the integers directly to the stack (this is the same as passing arguments to a function in C).

The moral of this story is not to second guess the compiler when it comes to micro-optimisations.

jforberg
  • 6,537
  • 3
  • 29
  • 47
  • Thank you. This is helpful. Can you also help me with the details for the XOR swapping? – Tamil Aug 12 '14 at 11:03
  • 1
    @Tamil Why don't you try it yourself on your own compiler? This is what you should be doing every time you consider doing micro-optimisation anyway. If you don't know how to do it I can help out, just tell me what compiler you use. – jforberg Aug 12 '14 at 11:07
  • I'm not familiar with Xcode but this answer may be helpful http://stackoverflow.com/questions/5433935/how-can-i-examine-assembly-in-xcode-4 – jforberg Aug 12 '14 at 11:15
  • I am able to view the assembly code within Xcode itself. From menubar : product -> Perform action -> Assemble . I have added the assembly instructions below. Looks like temp variable is better than xor (may be specific to the xcode compiler) – Tamil Aug 13 '14 at 19:25
  • @Tamil Good. Consider accepting my answer if you found it helpful. – jforberg Aug 15 '14 at 08:13
  • FYI, for swapping two globals, [gcc and clang can see through the +/- stupid swap idiom, but only clang optimizes away an xor-swap](https://godbolt.org/g/3leK2S). – Peter Cordes Apr 23 '16 at 03:42
  • Update: [gcc can optimize away an xor-swaps between members in a local struct, it's only for globals where gcc does worse than clang](https://godbolt.org/g/MFPvRe). – Peter Cordes Apr 23 '16 at 03:58
1

Thanks everyone for your valuable inputs. I wrote a simple program and viewed the assembly output (it's a C program compiled in Xcode). Looks like XOR operation is twice as costly than using a temporary variable. Attached both the generated assembly code images. When I use temporary variable each operation (a = b & so) takes two instructions. But when I use XOR each operation (a ^= b & so) takes 4 instructions. Below is my program for reference. The screenshot only contains the assembly code for lines within main (except return 0)

#include <stdio.h>

int main(int argc, const char * argv[])
{

int a = 100;
int b = 200;
int c;

c = a;
a = b;
b = c;

return 0;
}

Using temp variable c

#include <stdio.h>

int main(int argc, const char * argv[])
{

int a = 100;
int b = 200;

a ^= b;
b ^= a;
a ^= b;

return 0;
}

Swap using XOR operation

These results may be specific to MAC runtime and/or the way Xcode compiles. If you think I missed something during this process do post your comment.

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Tamil
  • 1,173
  • 1
  • 13
  • 35
  • This is useless compiler output from `-O0`, which doesn't tell us anything about what optimized code will look like. Neither of those functions have any side effects, so they should both compiler the same as just `return 0;`. I tested gcc and clang on +/- swap and xor swap of two globals, and only clang was able to optimize both stupid swap idioms. [gcc optimizes the +/-, but still emits 3 `xor`s for swapping globals, but can optimize away the xor when swapping members in a local struct arg before returning it](https://godbolt.org/g/MFPvRe). – Peter Cordes Apr 23 '16 at 03:56
-1

The "without temp variable" solutions have an arithmetic or logical operation per each statement apart from the assignment. For example,

      x = x + y;  
      y = x - y; 
      x = x - y; 

The temporary variable solution will simply have assignments.

temp = a;
a    = b;
b    = temp;

So the temporary variable solution will obviously have better performance in terms of CPU.

  • 1
    Downvote: you are wrong sir. I compiled your example and it resulted in **exactly** the same machine code as the other example. Don't underestimate the compiler. – jforberg Aug 12 '14 at 11:03
  • Confirmed @jforberg's point for [gcc and clang](https://godbolt.org/g/FLvzR3): swapping the contents of two integer globals just compiles to two loads / two stores either way. The `temp` way is safer, and still optimizes with floats, so it's highly preferable. But the main reason is apparently readability / source matching the operation. (The compiler output is a lot more like the temp-using source.) – Peter Cordes Apr 23 '16 at 03:41
-1

If we use below method

x = x + y;  
y = x - y; 
x = x - y; 

These instructions includes arithmetic operations along with assignments. It requires more cpu cycles to complete swapping.

If you use a temporary variable to swap two values, it don't need any arithmetic operations. It just do it by assigning values. This requires less CPU cycles as compared to above solution.

Chinna
  • 3,930
  • 4
  • 25
  • 55