-2

I know there is a similar question about this: constexpr performing worse at runtime.
But my case is a lot simpler than that one, and the answers were not enough for me. I'm just learning about constexpr in C++11 and a wrote a code to compare its efficiency, and for some reason, using constexpr makes my code run more than 4 times slower!
By the way, i'm using exactly the same example as in this site: https://www.embarcados.com.br/introducao-ao-cpp11/ (its in Portuguese but you can see the example code about constexpr). Already tried other expressions and the results are similar.

constexpr double divideC(double num){
    return (2.0 * num + 10.0) / 0.8;
}

#define SIZE 1000
int main(int argc, char const *argv[])
{
    // Get number of iterations from user
    unsigned long long count;
    cin >> count;
    
    double values[SIZE];

    // Testing normal expression
    clock_t time1 = clock();
    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8;
    }
    time1 = clock() - time1;
    cout << "Time1: " << float(time1)/float(CLOCKS_PER_SEC) << " seconds" << endl;
    
    // Testing constexpr
    clock_t time2 = clock();
    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = divideC( 3.0 );
    }
    time2 = clock() - time2;
    cout << "Time2: " << float(time2)/float(CLOCKS_PER_SEC) << " seconds" << endl;

    return 0;
}

Input given: 9999999999

Ouput:

> Time1: 5.768 seconds
> Time2: 27.259 seconds

Can someone tell me the reason of this? As constexpr calculations should run in compile time, it's supposed to run this code faster and not slower.

I'm using msbuild version 16.6.0.22303 to compile the Visual Studio project generated by the following CMake code:

cmake_minimum_required(VERSION 3.1.3)
project(C++11Tests)

add_executable(Cpp11Tests main.cpp)

set_property(TARGET Cpp11Tests PROPERTY CXX_STANDARD_REQUIRED ON)
set_property(TARGET Cpp11Tests PROPERTY CXX_STANDARD 11)
  • 3
    Do you compile with optimizations enabled? – HolyBlackCat Jul 12 '20 at 11:53
  • 1
    Yes, please identify the compiler, version, Standard option, and full compile/link command lines. – underscore_d Jul 12 '20 at 11:55
  • 1
    ...and no, just "Visual Studio compiler" is far from sufficient. – underscore_d Jul 12 '20 at 11:57
  • Added more edits about compiler. Is it enough? – viniciusPintoF Jul 12 '20 at 12:00
  • 2
    There's no evidence that you're compiling with optimisations enabled, and if you're not, there is no meaning to measurements of performance. – underscore_d Jul 12 '20 at 12:04
  • 1
    Without optimizations, the compiler will keep the `divideC` call so it is slower. With optimizations on the compiler knows that everything related to `values` can be optimized away without any side-effects. So the shown code can never give any meaningful measurements between the difference of `values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8;` or `values[i%SIZE] = divideC( 3.0 );` – t.niese Jul 12 '20 at 12:10
  • 1
    I understand. I didn't know it was required to use the optimisation options. Actually I just ran the same code compiling with g++ optimisation -O1 and in fact the constexpr part ran a little faster! Thank you! – viniciusPintoF Jul 12 '20 at 12:14
  • 1
    @viniciusPintoF `[…] and in fact the constexpr part ran a little faster[…]` no it does not run faster, because with `-O1` the `constexpr` part does not even exits in the generated code. You only measure the loop and some _"noise"_ when calling the `clock` function. – t.niese Jul 12 '20 at 12:38
  • 1
    @t.niese I understand that `constexpr` isn't a thing in the final machine code. So happens that I tested a few times and the results were a bit faster indeed. But I think it is a coincidence since compiling with `-S` shows that the two parts generated the same code, not performing the actual calculations in runtime for any of the loops. – viniciusPintoF Jul 12 '20 at 12:55
  • @viniciusPintoF `[…]not performing the actual calculations in runtime for any of the loops[…]` yes that is exactly the problem, and what I wrote in my answer. – t.niese Jul 12 '20 at 12:59
  • The times were identical on my machine. – Eljay Jul 12 '20 at 14:52

1 Answers1

2

Without optimizations, the compiler will keep the divideC call so it is slower.

With optimizations on any decent compiler knows that - for the given code - everything related to values can be optimized away without any side-effects. So the shown code can never give any meaningful measurements between the difference of values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8; or values[i%SIZE] = divideC( 3.0 );

With -O1 any decent compiler will create something this:

    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8;
    }

results in:

        mov     rdx, QWORD PTR [rsp+8]
        test    rdx, rdx
        je      .L2
        mov     eax, 0
.L3:
        add     eax, 1
        cmp     edx, eax
        jne     .L3
.L2:

and

    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = divideC( 3.0 );
    }

results in:

        mov     rdx, QWORD PTR [rsp+8]
        test    rdx, rdx
        je      .L4
        mov     eax, 0
.L5:
        add     eax, 1
        cmp     edx, eax
        jne     .L5
.L4:

So both will result in the identical machine code, only containing the counting of the loop and nothing else. So as soon as you turn on optimizations you will only measure the loop but nothing related to constexpr.

With -O2 even the loop is optimized away, and you would only measure:

    clock_t time1 = clock();
    time1 = clock() - time1;
    cout << "Time1: " << float(time1)/float(CLOCKS_PER_SEC) << " seconds" << endl;
t.niese
  • 39,256
  • 9
  • 74
  • 101