6

I am very confused with one thing... If I add constructor to struct A then calculating in for loop becomes many times slower. Why? I have no idea.

On my computer times of the snippet in outputs are:

With constructor: 1351

Without constructor: 220

Here is a code:

#include <iostream>
#include <chrono>
#include <cmath>

using namespace std;
using namespace std::chrono;

const int SIZE = 1024 * 1024 * 32;

using type = int;

struct A {
    type a1[SIZE];
    type a2[SIZE];
    type a3[SIZE];
    type a4[SIZE];
    type a5[SIZE];
    type a6[SIZE];

    A() {} // comment this line and iteration will be twice faster
};

int main() {
    A* a = new A();
    int r;
    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    for (int i = 0; i < SIZE; i++) {
        r = sin(a->a1[i] * a->a2[i] * a->a3[i] * a->a4[i] * a->a5[i] * a->a6[i]);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();

    cout << duration_cast<milliseconds>(t2 - t1).count() << ": " << r << endl;

    delete a;

    system("pause");
    return 0;
}

However if I remove sin() method from for loop like this:

for (int i = 0; i < SIZE; i++) {
    r = a->a1[i] * a->a2[i] * a->a3[i] * a->a4[i] * a->a5[i] * a->a6[i];
}

removing constructor does not matter and the time of execution is the same and equals 78.

Do you have similar behaviour with this code? Do you know a reason of this?

EDIT: I compile it with Visual Studio 2013

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
daretsuki
  • 77
  • 1
  • 3
  • 14
    Have you turned on optimization to the max? Have you investigated the difference in machine code produced? – Volodymyr Lashko Jul 14 '17 at 09:04
  • 1
    Testing performance without optimization is not useful. I'm assuming you didn't turn on optimization because if you did then the loop would probably be optimized out completely. – interjay Jul 14 '17 at 09:06
  • 9
    A possible source of the problem is that without a constructor the members of the structure will be *initialized* when the object is created. With the constructor the member variables will no longer automatically be initialized, since it's the job of the constructor, and since they are not initialized using their values will lead to *undefined behavior*. – Some programmer dude Jul 14 '17 at 09:12
  • 3
    unrelated: I would use typedefs either consitently or not at all. If you change `type` to `double` then `r` is still `int` – 463035818_is_not_an_ai Jul 14 '17 at 09:20
  • 2
    In order to speculate on the result of undefined behavior (caused by the use of uninitialized values) people will have to know exactly what your compiler, OS and CPU are. – qPCR4vir Jul 14 '17 at 09:22
  • 2
    Well, Someprogrammerdude dude already pointed the main issue in your test. in one case the array is initialized, in the other it isn't. initialize all your variables after the call to "A* a = new A();" and see if the difference still exists – Nicolae Natea Jul 14 '17 at 09:24
  • Man,please supply info regarding your compiler and compiler flags you set. – Michael IV Jul 14 '17 at 09:24
  • 1
    Ok sorry for shortcomings in my post. As you said it looks the answer is about initialization. Thank you for quick response. – daretsuki Jul 14 '17 at 09:28
  • 1
    I'd speculate that `sin` does some branching in there, and when you call `new A()` without a constructor it will zero everything and makes branch prediction a breeze. – Passer By Jul 14 '17 at 09:48
  • 1
    You have removed assign of `r = ` with `sin()` in last code block, so I think, that whole for loop is optimized out by compiler, because it does nothing. Try comparing time with `r = a->a1[i] * a->a2[i] * a->a3[i] * a->a4[i] * a->a5[i] * a->a6[i];` instead – Mischo5500 Jul 14 '17 at 09:53
  • Which compiler do you use? – geza Jul 14 '17 at 10:40
  • @Mischo5500 you're right - I fixed it, I compile it with Visual Studio 2013 compiler - whatever it is... And still problem why after removing sin problem does not occur? – daretsuki Jul 14 '17 at 13:18
  • You haven't said what the times you're getting are, but you should know the high_resolution_clock in Visual Studio 2013 isn't good. Consider upgrading to a newer version or using an alternative. https://stackoverflow.com/questions/23063759/time-for-an-algorithm-to-run/23078292#23078292 With Visual Studio 2017 I get ~210 with and ~310 without, not a 10x difference as you claim. – Retired Ninja Jul 14 '17 at 13:28
  • oh with full optimazation it is actually 5x but still a difference – daretsuki Jul 14 '17 at 13:42
  • Default constructor should be faster that custom, have a look [here](https://stackoverflow.com/questions/45099019/why-c-use-memsetaddr-0-sizeoft-to-construct-a-object-standard-or-compiler) why – Jeka Jul 14 '17 at 13:55
  • 1
    Your code is wrong has undefined behavior (uninitialized values are used), `for` loop is useless so compiler strips it. When compiler can figure out that everything is zeroinitialized it completly removes code under tests and you measuring nothing. Here is a [godbolt](https://godbolt.org/z/n31hTq3h9) – Marek R Feb 13 '22 at 20:54
  • Please watch this this will explain a lot: https://youtu.be/9BM5LAvNtus – Marek R Feb 13 '22 at 21:06

1 Answers1

2

Yes, this behavior is still reproducible in Visual Studio 2019 if compile in Release configuration (with optimization).

If struct A has empty user constructor, then its fields remain uninitialized after new A().

On the other hand, if struct A does not have a constructor, then it becomes an aggregate and new A() fills its fields with zeros.

Computing multiplications and then sine has the same performance independent of input arguments (if they are not de-normalized values, which is not the case here), but after the initialization of the fields with zeros they appear in CPU cache, so the following computation goes faster, which explains the "benefit" of no-constructor version (of course, if not include in the measurement the time of object construction).

If you keep empty constructor, and then manually fill the object with zeros:

    A* a = new A();
    for (int i = 0; i < SIZE; i++)
        a->a1[i] = a->a2[i] = a->a3[i] = a->a4[i] = a->a5[i] = a->a6[i] = 0;

then the program will be same fast as in the case of no constructor in A.

Fedor
  • 17,146
  • 13
  • 40
  • 131
  • 1
    Why do zeroes matter here? Does that let the compiler optimize away some work because anything times zero is zero? Or is it using the legacy 32-bit mode x87 FPU which is slower for some inputs like NaN, unlike SSE? ([Huge performance difference (26x faster) when compiling for 32 and 64 bits](https://stackoverflow.com/a/31879376) shows that's the case on a Nehalem CPU, for example) – Peter Cordes Feb 13 '22 at 20:37
  • @PeterCordes Many operations mathematical operations timing depends on the actual values (iterative computation) – Phil1970 Feb 13 '22 at 20:52
  • 1
    There is UB (initialized values used) and when zeronitialization is done test code is completely removed: https://godbolt.org/z/n31hTq3h9 – Marek R Feb 13 '22 at 20:55
  • 1
    Interesting msvc is not so smart as gcc and code under test is exact same: https://godbolt.org/z/WPv3x1fGo – Marek R Feb 13 '22 at 21:04
  • @Phil1970: Maybe for the `sin` library function, but other than NaN or subnormals, hardware FP multiply is constant-time, not data dependent. (And with SSE for scalar math, even NaN is constant time, leaving only subnormals as potentially slow.) That was the point of my comment you're replying to, to ask whether it was the same asm code with data-dependent performance, or whether work got optimize away. (Hardware FP division / sqrt is data-dependent on some CPUs even for finite inputs, especially on older x86.) – Peter Cordes Feb 13 '22 at 21:10
  • 1
    Ah, now *that* makes sense; writing the values outside the timed region will also get page faults (from lazy allocation) out of the way, as well as TLB misses and priming the cache. One of the major factors I mentioned in my answer on [Idiomatic way of performance evaluation?](https://stackoverflow.com/q/60291987) – Peter Cordes Feb 19 '22 at 21:46