Why does memory_order_relaxed and memory_order_seq_cst make no difference?

Question

I was playing with one of the examples in C++ Concurrency in Action which uses std::memory_order_relaxed for reading and writing 3 atomic variables from 5 different threads. The example program is as follows:

#include <thread>
#include <atomic>
#include <iostream>

std::atomic<int> x(0);
std::atomic<int> y(0);
std::atomic<int> z(0);
std::atomic<bool> go(false);

const unsigned int loop_count = 10;

struct read_values
{
   int x;
   int y;
   int z;
};

read_values values1[loop_count];
read_values values2[loop_count];
read_values values3[loop_count];
read_values values4[loop_count];
read_values values5[loop_count];

void increment( std::atomic<int>* v, read_values* values )
{
    while (!go)
       std::this_thread::yield();

    for (unsigned i=0;i<loop_count;++i)
    {
       values[i].x=x.load( std::memory_order_relaxed );
       values[i].y=y.load( std::memory_order_relaxed );
       values[i].z=z.load( std::memory_order_relaxed );
       v->store( i+1, std::memory_order_relaxed );
       std::this_thread::yield();
    }
}

void read_vals( read_values* values )
{

   while (!go)
      std::this_thread::yield();

   for (unsigned i=0;i<loop_count;++i)
   {
      values[i].x=x.load( std::memory_order_relaxed );
      values[i].y=y.load( std::memory_order_relaxed );
      values[i].z=z.load( std::memory_order_relaxed );
      std::this_thread::yield();
   }
}

void print( read_values* values )
{
   for (unsigned i=0;i<loop_count;++i)
   {
      if (i)
         std::cout << ",";
      std::cout << "(" << values[i].x <<","
                       << values[i].y <<","
                       << values[i].z <<")";
   }
   std::cout << std::endl;
}

int main()
{
   std::thread t1( increment, &x, values1);
   std::thread t2( increment, &y, values2);
   std::thread t3( increment, &z, values3);
   std::thread t4( read_vals, values4);
   std::thread t5( read_vals, values5);

   go = true;

   t5.join();
   t4.join();
   t3.join();
   t2.join();
   t1.join();

   print( values1 );
   print( values2 );
   print( values3 );
   print( values4 );
   print( values5 );

   return 0;
}

Every time I run the program I get exactly the same output:

(0,10,10),(1,10,10),(2,10,10),(3,10,10),(4,10,10),(5,10,10),(6,10,10),(7,10,10),(8,10,10),(9,10,10)
(0,0,1),(0,1,2),(0,2,3),(0,3,4),(0,4,5),(0,5,6),(0,6,7),(0,7,8),(0,8,9),(0,9,10)
(0,0,0),(0,1,1),(0,2,2),(0,3,3),(0,4,4),(0,5,5),(0,6,6),(0,7,7),(0,8,8),(0,9,9)
(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0)
(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0)

If I change from std::memory_order_relaxed to std::memory_order_seq_cst the program gives exactly the same output!

I would have expected different output from the 2 versions of the program. Why is there no difference between the output for std::memory_order_relaxed and std::memory_order_seq_cst?

Why does std::memory_order_relaxed always produce exactly the same results for every run of the program?

I am using: - 32bit Ubuntu installed as a virtual machine (under VMWare) - An INtel Quad Core processor - GCC 4.6.1-9

The code is compiled with: g++ --std=c++0x -g mem-order-relaxed.cpp -o relaxed -pthread

Note the -pthread is necessary, otherwise the following error is reported: terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted

Is the behaviour I am seeing due to lack of support with GCC, or as a result of running under VMWare?

@Michael Burr - Thats solved it - Vmware was set to 1 core, I have just set it to 4 cores now I get the expected results. Have to admit I am new to using VMWare - for work I am used to just having Linux as the native OS. If you post your suggestion I will accept it as an answer. Cheers. — mark, Mar 27 '12 at 09:41

score 6 · Accepted Answer · answered Mar 27 '12 at 10:06

6

How many processor cores do you have assigned to the VM? Assign multiple cores to the VM to let it take advantage of concurrency.

answered Mar 27 '12 at 10:06

Michael Burr

333,147
50
533
760

score 2 · Answer 2 · answered Sep 01 '13 at 05:10

Your use of yield is causing your program's performance to be more dependent on your platform's scheduler than anything else.

That being said, memory_order_relaxed does not demand the compiler reorder the atomics, it merely allows the compiler to do so. If the compiler is happy with the ordering it gets with memory_order_seq_cst, then it may in fact yield the exact same bytecode! This is especially true on x86 because the instruction set already offers so many ordering guarantees, so it isn't as much of a leap to arrive at memory_order_seq_cst.

score 0 · Answer 3 · answered Sep 03 '13 at 05:55

Many versions of GCC ignore the memory ordering that you provide and replace it with sequential consistency. You can see this in the header files. Hopefully, they'll eventually have a better implementation? You can play around the effects of relaxed vs. seq_cst by using CDSChecker...

Why does memory_order_relaxed and memory_order_seq_cst make no difference?

3 Answers3

Linked

Related