What's the boundary of compiler optimizations on creation of objects in C/C++

Question

i'm Qt/C++ programmer. Nowadays I'm working on thread-safety and I've saw tons of discussion on this. Then I've listed safety issues under four title:

Atomicity while read/write variables between threads
Compiler optimizations on variables which are shared between threads
Signal interrupt while read/write variables which are shared between threads
Confusion of concurrent read/write of variables which are shared between threads

I have understood all of these except second clause. I think it's very complex to understand, because it's not explicit. It depends on compiler behaviour and how can i predict the compiler behaviour, right. For example:

int i=0;
void foo()
{
    for (;i<100;i++)
    {}
{

According to some resources, on above code, the compiler will move i from memory to cpu's register until counting finished, then will write it back with its final value. So, what happens if another thread attempted to read value of i while the above code counting. It just zero until counting finished. Because the real value is still on cpu's register until counting finished. Thus, unexpected status will occur. Let's take much further example:

class MyThread : public QThread
{
    Q_OBJECT

    void run()
    {
         mutex.lock();
         status=true;
         mutex.unlock();
    }

    private:
        QMutex mutex;
        bool status;
};

int main()
{
    MyThread thread;
    thread.start();
    return 0;
}

On above code, according to Qt Documentation, the member variables of thread object owned by main thread, because it's initialized in main() function, and the codes of run() function is executed on second thread. And I used mutex object for serialization of accesses and atomicity, so far so good. But how can I know that, the compiler actually initialized mutex object into memory before second thread used it on run() function. Because compiler will never saw actual usage of mutex object in sequential code flow, so it may not load it into memory. For example on compile time, compiler may will delete some variables which are not used in sequential code flow for gain extra memory, or maybe it will write all the member values into memory after some memory order operations. how can i know that? How can we know whether the compiler optimized variable?

Keep in mind that every thread has its own stack. Each thread will be executing your for loop independently. — eoD .J, Apr 10 '16 at 01:18
The edit to this question completely changed the first part of it, and anyone who spent the time trying to answer it wasted their time. This is a poorly prepared, and poorly phrased question. — Sam Varshavchik, Apr 10 '16 at 01:28
Sorry, if i wasted your time, but i wanted to explain it more. — , Apr 10 '16 at 01:31
For a good intro, look at what `volatile` does and does not do. — o11c, Apr 10 '16 at 02:56

score 0 · Accepted Answer · edited May 23 '17 at 10:31

But how can I know that, the compiler actually initialized mutex object into memory before second thread used it on run() function.

If you were designing your own mutex class from first principles, this would indeed be a problem you'd need to worry about. Not only could the compiler rearrange the ordering of your code as part of its optimization process, but even if the compiler didn't mess things up for you, modern CPUs often rearrange the order in which they execute instructions on-the-fly, so as to better keep their execution pipelines as well-utilized as possible.

The optimizer included in a C++ compiler operates under the "as-if rule", which says it can do any crazy transformation to your code that it wants, as long as the resulting program's observable behavior is indistinguishable from the behavior of the source code you wrote. For single-threaded programs that's fine, but once you get into having one thread try to read or write another thread's variables without any synchronization, all of the compiler's (and the CPU's) clever optimization tricks can become "visible" to the second thread and thus the second thread is likely to see undefined (read: weird and unexpected) behavior unless you are very careful with synchronization.

So how do the authors of low-level threading libraries like the pthreads library (which the QMutex and QThread classes are likely using in their internal implementation) hide all this chaos? They do it by inserting memory barriers and optimization barriers into their code at the appropriate spots. The optimization barriers tell the compiler "do not move accesses past this point while optimizing, and the memory barriers are similar except they are handled by the CPU at run-time to constrain its out-of-order-execution optimizations.

Since you are using high-level constructs like QMutex, you don't have to worry about specifying barriers yourself, since the code in the QMutex and QThread class's member functions already does that for you as necessary.

But to answer your question: By default, there are no boundaries on compiler optimization, other than the ones that are explicitly specified by the programmer or the C++ specification (and the C++ specification tries really hard to be as liberal as possible, since it doesn't want to preclude optimizations if it doesn't have to).

Your answer is perfect and it was exactly what I need, actually it's made me see things I miss, thank you very much. So, i can't trust to compiler fully when i code multithread applications. On other hand, as you said i can trust to higher-level professional libraries. Because as you said again those are all doing necessary things in their defines. And indeed they are doing so.... — , Apr 10 '16 at 10:23
... Things are that i miss, i have researched QMutex codes then i have saw the memory barrier mechanism on mutex.unlock() function to ensure that all the things wrote up before unlock() completed. However i can't thought the same mechanism on QMutex() constructor function, but now i think it should have the memory barriers on constructor function, and it's very sensible. — , Apr 10 '16 at 10:24
Typically you would create your QMutex objects before spawning any of the threads that might use them; otherwise you'd run the risk of the other threads trying to lock the QMutex before its constructor had finished executing. In that case the necessary barriers would be provided by the pthread_create() call. In the (hopefully rare) special case where you are creating a QMutex that might be immediately used by another thread that is already running, you would need to somehow synchronize your threads' behavior (perhaps using a second QMutex?) to avoid a race condition. — Jeremy Friesner, Apr 11 '16 at 02:51

What's the boundary of compiler optimizations on creation of objects in C/C++

1 Answers1