1

I looked around and combined a basic temporary variable swap with a type template restricted to arithmetic types. Why is this faster than std::swap?

template <typename T, typename std::enable_if<std::is_arithmetic<T>::value>::type* = nullptr>
void swp(T& x, T& y) {
    T t = x; x = y; y = t;
}

Here is the specific implementation I am using for testing: (try clearing the cache for testing consistency, see this post for info)

int main() {
    const size_t bigger_than_cachesize = 10 * 1024 * 1024;
    long* p = new long[bigger_than_cachesize];
    for (int i = 0; i < bigger_than_cachesize; i++) p[i] = rand();
    std::cout << "Cache is flushed..." << std::endl;
    /// IGNORE ABOVE (ATTEMPTING TO CLEAR CACHE FOR CONSISTENCY)

    double duration;
    int x = 2560, y = 435;
    std::clock_t start;
    start = std::clock();

    for(int i = 0; i < 100000000; i++) std::swap(x,y);

    duration = (std::clock() - start);
    std::cout << "std::swap: " << duration << '\n';
    duration = 0;
    start = std::clock();

    for (int i = 0; i < 100000000; i++) swp(x,y);

    duration = (std::clock() - start);
    std::cout << "swapTMP: " << duration << '\n';
}

Results: (5:1 ratio)

std::swap -> 5086
<T> swp   -> 1397
FatalSleep
  • 307
  • 1
  • 2
  • 15
  • 1
    Please show a [mre] with your full code, compiler settings and performance measurements – Alan Birtles Jul 24 '20 at 06:30
  • 1
    The compiler is likely optimizing your arithmetic swap without actually using temporary variables. One would hope that a *decent* `std::swap()` implementation would already be optimized for arithmetic types. – Remy Lebeau Jul 24 '20 at 06:30
  • 1
    What do you mean by _seem to be faster_? What experiment did you design. How did you measure runtime? Which compiler, optimizations, architecture, ...? – Daniel Langr Jul 24 '20 at 06:31
  • @DanielLangr x64, VS compiler, standard optimizations. – FatalSleep Jul 24 '20 at 06:36
  • I don't understand your experiment. You allocate a large array but then measure swapping of two local variables. Both loop have no effect in the end and may be completely optimized away by the compiler. And, also are with optimizations: https://godbolt.org/z/rxv7Px. Moreover, subtracting results of `std::clock` does not give you milliseconds. – Daniel Langr Jul 24 '20 at 06:56
  • @DanielLangr I added missing comments, my apologies... – FatalSleep Jul 24 '20 at 07:01
  • @FatalSleep What are _standard optimizations_? It seems the they are disabled. Measuring performance without enabled optimizations does not make any sense. – Daniel Langr Jul 24 '20 at 07:01
  • @DanielLangr whatever VS2019 uses by default for console applications. – FatalSleep Jul 24 '20 at 07:02
  • I can only reproduce your results in VS2017 by using a debug build, in a release build both measurements are 31 ms. – Alan Birtles Jul 24 '20 at 07:17

1 Answers1

3

Take a look on assembly of this simple code, when optimizations are enabled (-O2).

#include <algorithm>

int foo(int a, int b) {
    for(int i = 0; i < 100000000; i++) std::swap(a, b);
    return a;
}

template <typename T, typename std::enable_if<std::is_arithmetic<T>::value>::type* = nullptr>
void swp(T& x, T& y) {
    T t = x; x = y; y = t;
}

int bar(int a, int b) {
    for(int i = 0; i < 100000000; i++) swp(a, b);
    return a;
}

Here is godbolt.

Machine code for foo and bar are exact the same when compared for each compiler.

What is more important msvc was able to optimize away for loops detecting that they do not introduce any visible outcome ("as if" rule).

So since you have different results this means you are testing that incorrectly.

Remember that testing performance such small and fast functions like swap is extremely hard from technical point of view and it is easy to do mistake which will lead to wrong conclusions.

Basically it looks like you have reach limit of time resolution of std::clock().

Marek R
  • 32,568
  • 6
  • 55
  • 140
  • If I include the for loop for adjusting the `x,y` variables then the compiler can't optimize it away. Right? – FatalSleep Jul 24 '20 at 07:07
  • In this case the compiler can't really optimize away the for loop anyways? Since each loop the values are swapped, there shouldn't be anything for the compiler to do. – FatalSleep Jul 24 '20 at 07:09
  • 1
    @FatalSleep After the loop, the values of swapped variables are the same as before the loop. So yes, a compiler can completely optimized loops away. I suggest to learn more about optimizations and the "as-if" rule before making such experiments. – Daniel Langr Jul 24 '20 at 07:11
  • Please provide details: compiler, how you build it and how you run it. – Marek R Jul 24 '20 at 07:11
  • @MarekR pretty basic, default console app settings in VS2019. Haven't changed a thing. – FatalSleep Jul 24 '20 at 07:15
  • @FatalSleep Is it a Debug or Release build? – Blastfurnace Jul 24 '20 at 07:16
  • @Blastfurnace debug and release. Though as Marek mentioned... voila, optimized away. Problem solved. – FatalSleep Jul 24 '20 at 07:21
  • Here is your full code: https://www.godbolt.org/z/hvaYYE loops because of size its harder to read, but you can find that loops has been removed. – Marek R Jul 24 '20 at 07:35