0

I find a strange phenomenon when using global variable as parameter in C++. Here is the code. The program would never end if using -O2.

#include <thread>
#include <iostream>

using namespace std;

#define nthreads 2
struct Global{
    int a[nthreads];
};

void func(int* a){
    while(*a == 0){
    }
    cout << "done" << endl;
}

struct Global global;

int main(){
    thread pid[nthreads];
    memset(global.a, 0, nthreads*sizeof(int));

    for(int i=0; i<nthreads; i++){
        pid[i] = std::thread(func, &global.a[i]);
    }

    sleep(2);
    cout << "finished" << endl;
    memset(global.a, 1, nthreads*sizeof(int));

    for(int i=0; i<nthreads; i++){
        pid[i].join();
    }

    return 0;
}

If using -O0, everything seems ok. And print variable *a in while loop, it is still ok.

So I guess it must be the problem of C++ optimization. But How could compiler make such an aggressive optimization for global variable and multi-thread?


Thanks for all answers and comments, I tried to use volatile and it does work. I don't want to use mutex because using mutex in every loop influence the performance.

In fact, I want to do something like this:

  1. A worker thread loops through a global list and does some execution in each while loop.(I don't want use mutex here, because it does not matter even though mistake happens in just one loop)

  2. Some other threads would possibly add items to this list.(mutex used here is ok. because each thread just adds one time)

What should I do correctly?

Waker Leo
  • 129
  • 1
  • 5
  • There's no point in using a mutex in only one of the two threads - it's literally a MUTual EXclusion mechanism. Doesn't work on one side only. – MSalters Sep 02 '16 at 08:25
  • On the last part, what you're describing is called a producer-consumer pattern. You may want to look into [Boost LockFree](http://www.boost.org/doc/libs/1_61_0/doc/html/lockfree.html) which is designed for such patterns. – MSalters Sep 02 '16 at 08:28

2 Answers2

2

The current code allows the compiler to optimize as if there's no threading. And so when the compiler sees a loop with unchanging condition, it can just optimize that away. Or, as seems likely for the behavior you observed, replace the intended memory fetch in the condition with a value from a register.

One way to make this work is to use std::atomic.

I've only dabbled in multithreading in modern C++, for purposes of learning and exploration, but this code's working:

#include <atomic>
#include <array>
#include <thread>
#include <iostream>
using namespace std;

int const nthreads = 2;

void func( atomic<int>* a )
{
    while( a->load() == 0 )
    {}
    cout << "done" << endl;
}

namespace global {
    array<atomic<int>, nthreads> a;     // Zero-initialized automatically.
}  // namespace global

auto main()
    -> int
{
    using namespace std::chrono_literals;

    thread pid[nthreads];
    for( int i = 0; i < nthreads; ++i )
    {
        pid[i] = thread( func, &global::a[i] );
    }

    this_thread::sleep_for( 2ms );
    cout << "finished" << endl;
    for( auto& item : global::a )
    {
        item = ( int( unsigned( -1 ) & 0x0101010101010101 ) );
    }

    for( int i = 0; i < nthreads; ++i) { pid[i].join(); }
}
Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
-2

Because the compiler knows nothing about threads. All it does is compile the code it's told to compile.

A compiler is permitted to optimize away redundant code. Here, since the compiler knows it retrieved the value that a pointer is pointing to, once, so it doesn't need to do it again.

There are several ways to do this correctly.

  1. Use volatile, a volatile qualifier explicitly tells the compiler not to optimize away any access to the volatile object.

  2. A volatile keyword in of itself is not always sufficient to correctly implement multithreaded execution. Although you could get away with it here, the correct way to properly sequence multithreaded execution is by using mutexes and condition variables. You will find a complete description in your favorite C++ book.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • 2
    [`volatile` is basically useless here](http://stackoverflow.com/a/4558031/179910). It is neither necessary nor sufficient to get the desired behavior. – Jerry Coffin Sep 02 '16 at 02:43
  • @JerryCoffin: `volatile` prevents the compiler from optimizing away the checking loop. That's basically what `volatile` does: preventing each memory access from being optimized away, useful for e.g. a memory mapped i/o card register. But it doesn't guarantee that the value fetched is an updated value: it could be a value from a cache. – Cheers and hth. - Alf Sep 02 '16 at 02:47
  • 2
    **−1** Because the advice "to do this correctly. 1 Use `volatile`" is incorrect on many (most?) platforms. And isn't guaranteed by the standard. Andrei Alexandrescu, I think together with Herb Sutter, wrote a decent article about why `volatile` is very ungood for multithreading in portable code. I don't have a link, sorry, but google it. Instead of `volatile` the code could use e.g. an `atomic` variable. Or explicit synchronization. – Cheers and hth. - Alf Sep 02 '16 at 02:54
  • 1
    @Cheersandhth.-Alf: I'd guess the paper you're thinking of is by [Alexandrescu and Meyers](http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf). – Jerry Coffin Sep 02 '16 at 03:00
  • @JerryCoffin: Yes, that's it. Thanks! :) – Cheers and hth. - Alf Sep 02 '16 at 03:15