3

I have a forever thread loop below calling std::this_thread::sleep_for to delay 10 ms. The duration is a temp object std::chrono::milliseconds(10). The delay call seems "normal" and "typical" following some sample code. However looking a bit closer, it is evident that in each cycle the temp duration object is created and destroyed once.

// Loop A.
for (;;)
{
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
    // Do something.
}

Now if the duration object is created outside the loop (as a constant one), it will be constructed only once for all the cycles. Please see code below.

// Loop B.
const auto t = std::chrono::milliseconds(10);
for (;;)
{
    std::this_thread::sleep_for(t);
    // Do something.
}

Question: Since the std::this_thread::sleep_for uses "const &" as its argument type, will any C++ compiler optimize the temp duration object inside the Loop A into something like the Loop B?

I tried a simple test program below. The result shows that VC++ 2013 does not optimize the "const &" temp object.

#include <iostream>
#include <thread>

using namespace std;

class A {
public:
    A() { cout << "Ctor.\n"; }
    void ReadOnly() const {}  // Read-only method.
};

static void Foo(const A & a)
{
    a.ReadOnly();
}

int main()
{
    cout << "Temp object:\n";
    for (int i = 0; i < 3; ++i)
    {
        Foo(A());
    }

    cout << "Optimized:\n";
    const auto ca = A();
    for (int i = 0; i < 3; ++i)
    {
        Foo(ca);
    }
}
/* VC2013 Output:
Temp object:
Ctor.
Ctor.
Ctor.
Optimized:
Ctor.
*/
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Garland
  • 911
  • 7
  • 22

2 Answers2

12

MSVC and other modern compilers are perfectly able to optimize temporary objects in loops.

The problem in you example is that you have a side effect in the constructor. According o the C++ standard, the compiler is then not allowed to optimize away the creation/destruction of your temporary object, since it would no longer produce the same observable effects (i.e. printing 3 times).

The picture is completely different if you no longer cout something. Of course, you'll have to look at the assembler code generated to verify the optimisation.

Example:

class A {
public:
    static int k;    
    A() { k++;  }   
    void ReadOnly() const {}  // Read-only method.
};
int A::k = 0; 

// Foo unchanged

int main()
{
    for(int i = 0; i < 3; ++i)
        Foo(A());  // k++ is a side effect, but not yet observable 
    volatile int x = A::k;     // volatile can't be optimized away

    const auto ca = A();
    for(int i = 0; i < 3; ++i)
        Foo(ca);
    x = A::k;     // volatile can't be optimized away
    cout << x << endl; 
}

The optimizer noticed perfectly well that it's the same static variable that gets incremented, that it's not used elsewhere. So here the assembler code generated (extract):

mov eax, DWORD PTR ?k@A@@2HA        ; A::k     <=== load K 
add eax, 3                                     <=== add 3 to it (NO LOOP !!!)
mov DWORD PTR ?k@A@@2HA, eax        ; A::k     <=== store k
mov DWORD PTR _x$[ebp], eax                    <=== store a copy in x 
inc eax                                        <=== increment k
                                               <=== (no loop since function doesn't perform anything)
mov DWORD PTR ?k@A@@2HA, eax        ; A::k     <=== store it 
mov DWORD PTR _x$[ebp], eax                    <=== copy it to x

Of course, you need to compile in release mode.

As you can see, the compiler is very very clever. So let him do his job, keep concentrated on your code design, and keep in mind: Premature optimization is the root of all evil ;-)

Christophe
  • 68,716
  • 7
  • 72
  • 138
2

Assuming the compiler "understands" what the constructor does (in other words, has the source-code for the constructor available in the translation unit - that is, the source file or one of the header-files, contains the definition for that constructor), then the compiler should remove superfluous calls to the constructor that doesn't have side-effects.

Since printing something is a very definite side-effect of your A constructor, the compiler clearly can't optimize that out. So, the compiler is doing exactly the "right" thing here. It would be VERY bad if you have, for example, a lock-holding constructor that then releases the lock in the destructor, and the compiler decided to optimise your:

for(...)
{
   LockWrapper lock_it(theLock);
   ... some code here 
}

to outside the loop, since, although the overhead of taking and releasing the lock is lower, the semantics of the code changes and the duration of the lock is potentially MUCH longer, which would have an effect on OTHER code using the same lock, for example in a different thread.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227