2

According to intel's manual. Neither Loads Nor Stores Are Reordered with Like Operations According to 8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations

at document https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html enter image description here

but I created a simple case, I found r1=1 and r2=2 happened.

#include <thread>
#include <iostream>

using namespace std;

volatile int x;
int b[500];
volatile int y;
volatile int start;

int s1;
int s2;
int s3;
int s0;
int foo()
{
    while(start==0);
    x=1;
    asm volatile("" ::: "memory");
    y=1;
    return 0;
}

int fool2()
{
    int a,b;
    while(start==0);
    a=x;
    asm volatile("" ::: "memory");
    b=y;

   if(a==0 && b==1)
         s0++;
   if(a==0 && b==0)
         s1++;
   if(a==1 && b==0)
         s2++;
   if(a==1 && b==1)
        s3++;
   return 0;
}

int main()
{
  int i=0;
  while(1)
  {
     x=y=0;
     thread t1(foo);
     thread t2(fool2);
     start = 1;
     t1.join();
     t2.join();
     i++;
     if((i&0xFFFF)==0)
     {
           cout<<s0<<" "<<s1<<" "<<s2<<" "<<s3<<endl;
     }
  }
}

g++ -O2 -pthread e.cpp

gcc version 7.5.0

output:

69 86538 1 19246512

The four case (r1 and r2 with 0, 1 combination) is all possible.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
frank
  • 23
  • 2
  • 6
    `volatile` is **not** a valid thread synchronization technique in C++. That means your code has data races and those have undefined behavior, meaning your code does as well. – NathanOliver Jul 19 '21 at 16:41
  • It's quite possible for the `b=y;` to execute on t2 before the `y=1;` executes on t1. A similar thing happens for the counts of `s1`. – 1201ProgramAlarm Jul 19 '21 at 17:25
  • Test code is testing intel process behavior. Intel memory order spec said the Neither Loads Nor Stores Are Reordered during core run program. – frank Jul 19 '21 at 17:44
  • see https://i.stack.imgur.com/CO2MC.png – frank Jul 19 '21 at 18:00
  • sorry, it my code error. b=y should be ahead of a=x. – frank Jul 19 '21 at 18:16
  • 2
    I suggest checking generated asm first to ensure it does what you think it does. Your C++ code has undefined behaviour (basically compiler optimizations may produce completely different code than you expect). If you want to check your case -- either write in asm or make sure C++ compiler generated precisely what you want it to generate. – C.M. Jul 19 '21 at 18:36
  • 1
    @NathanOliver: You're talking about ISO C++. The question is using G++, i.e. GCC, which does support using `volatile` that way (notably in the Linux kernel's hand-rolled atomics with volatile and inline asm). It's not *recommended*, but as I explained in [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) it does *work* in practice somewhat like `atomic` with `mo_relaxed` on compilers that handle it the way G++ does, on hardware with coherent caches (like all CPUs that current C++ implementations will start std::thread across.) – Peter Cordes Jul 19 '21 at 18:58
  • 3
    Looks like you don't set `start = 0;` *after* thread.join, so the next pair of threads will ignore their `while(start==0)` spin-loops. I haven't yet figured out whether that could explain things, or if you have other bugs or wrong assumptions. The `x=y=0` happens before the threads are even started, so that's safe; a newly-started thread won't see `1`s left over from the previous run. – Peter Cordes Jul 19 '21 at 19:04
  • 2
    Your code is broken: `foo.cpp:21:1: warning: no return statement in function returning non-void` in both `fool1` and `fool2`. g++11.1 compiles those functions to infinite loops, I think (because it assumes the return path is unreachable because that would be UB), so nothing ever gets printed. Also, g++ warns about the missing return type in your `main()` definition, but does accept it. – Peter Cordes Jul 19 '21 at 19:14
  • 1
    After fixing those bugs (including adding `start=x=y=0`), I did eventually get a few counts in the first column (and 2 in the 3rd) on my i7-6700k Skylake running Linux, so yeah there's something else weird about this code. – Peter Cordes Jul 19 '21 at 19:24
  • @PeterCordes, thanks. I find my code error. It should be at foo2 b=y; asm volatile("" ::: "memory"); a=x; – frank Jul 19 '21 at 21:07
  • Oh right, I see, you're reading the variables in the *same* order they were written, not opposite, so all possibilities can be a result of simple program-order interleaving. – Peter Cordes Jul 19 '21 at 21:45

1 Answers1

2

Take a closer look at what Section 8.2.3.2 of the intel manual. In your example your are effectively doing:

Processor 1 Processor 2
mov [ _x], 1 mov r2, _x
mov [ _y], 1 mov r1, _y

Instead of what the intel manual says:

Processor 1 Processor 2
mov [ _x], 1 mov r1, _y
mov [ _y], 1 mov r2, _x

In the your example processor 2 may load _x before _x is set by processor 1 and then load _y after processor 1 stores it thus allowing for (r1=1, r2=0):

Instruction Processor
mov r2, _x 2
mov [ _x], 1 1
mov [ _y], 1 1
mov r1, _y 2

In the Intel example processor 2 can only load _x after it loads _y and processor 1 only sets _y after it sets _x so (r1=1, r2=0) is impossible.

Here is some code that demonstrates the Intel behavior:

#include <thread>
#include <iostream>
#include <stdlib.h>

using namespace std;

volatile int x;
volatile int y;
volatile int start;

constexpr bool flipOrdering = true; //Set this to true to see Intel example, false to see your example
constexpr int jitter = 10000;       //Range of random delay inserted between load/stores to make differences more obvious

int s1;
int s2;
int s3;
int s0;
int foo() {

    while(start==0);

    for(volatile int i = rand()%jitter; i; --i);
    x = 1;
    
    for(volatile int i = rand()%jitter; i; --i);
    asm volatile("" ::: "memory");

    for(volatile int i = rand()%jitter; i; --i);
    y = 1;

    return 0;
}

int fool2() {
    int a, b;
    while(start==0);

    for(volatile int i = rand()%jitter; i; --i);
    if constexpr(flipOrdering) b = y;
    else a = x;

    for(volatile int i = rand()%jitter; i; --i);
    asm volatile("" ::: "memory");

    for(volatile int i = rand()%jitter; i; --i);
    if constexpr(flipOrdering) a = x;
    else b = y;

   if(a==0 && b==1)
         s0++;
   if(a==0 && b==0)
         s1++;
   if(a==1 && b==0)
         s2++;
   if(a==1 && b==1)
        s3++;

    return 0;
}

int main() {
    int i=0;
    while(i< 1000) {
        x=y=0;
        thread t1(foo);
        thread t2(fool2);
        start = 1;
        t1.join();
        t2.join();
        i++;

        if((i%100)==0) {
            cout<<s0<<" "<<s1<<" "<<s2<<" "<<s3<<endl;
        }
    }

    return 0;
}

And here's a link to the the same code running in Compiler Explorer.