3

I'm running x86, and I want to practically see a bug caused by out-of-order execution on my machine. I tried writing one, based off this wiki article, but I always see "value of x is 33":

#include<stdio.h>
#include<pthread.h>
#include <sys/types.h>

int x, f;

void *handler(void *ptr) { 
  while (f == 0); 
  // Expectation: Sometimes, this should print 11 due to out-of-order exec
  printf("value of x is %d \n", x);
  return NULL;
}

int main() {
     pthread_t thread1;
     while(1) {
       x = 11; f = 0;
       pthread_create(&thread1, NULL, handler, NULL);
       x = 33; 
       f = 1;
       pthread_join(thread1, NULL);
     }   
     return 0;
}

What's the simplest c program that can illustrate an out-of-order execution bug? Why doesn't this sometimes print "value of x is 11"?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sush
  • 1,169
  • 1
  • 10
  • 26
  • the thread doesn't start quicky enough. Just add a `sleep()` after having created it in the main function to get your 11 value – Jean-François Fabre Oct 07 '18 at 18:29
  • another way would be (maybe) loading the system with some CPU-intensive processes to change de race condition – Jean-François Fabre Oct 07 '18 at 18:30
  • 4
    @Jean-FrançoisFabre but `handler` does nothing until `f = 1;` had been done, by which time `x = 33;` was already done. – Weather Vane Oct 07 '18 at 18:31
  • 2
    @Jean-FrançoisFabre Dont think there's a race condition here. – Sush Oct 07 '18 at 18:35
  • @WeatherVane `why doesn't this sometimes print "value of x is 11"?` perhaps that is the answer. – kiran Biradar Oct 07 '18 at 18:36
  • @kiranBiradar but "your program works" is not an answer. Should the question be on CodeReview? – Weather Vane Oct 07 '18 at 18:37
  • @WeatherVane I'm not sure though there is a `what` in the question. – kiran Biradar Oct 07 '18 at 18:42
  • @kiranBiradar that's asking for code. Perhaps OP should experiment, perhaps by dropping the `while (f == 0);` condition. – Weather Vane Oct 07 '18 at 18:44
  • @WeatherVane I tried it still produces `33`. Maybe it needs detailed explanation of why sequence of thread execution is not guaranteed. – kiran Biradar Oct 07 '18 at 18:49
  • To clarify, my question is not about a race condition between the main thread and thread1- I don't think there is one. However, my understanding is that the above code is still wrong as x86 (and other processors) do out of order execution- in other words, the printf might already be executed before the while(f==0); loop ends, using the older value of x=11 (due to the Reorder Buffer). My aim is to find a simple example where this is illustrated. – Sush Oct 07 '18 at 18:54
  • 3
    @kiranBiradar after dropping that `while (f == 0);` restraint, if you run the program many times perhaps sooner or later the time-slice scheduling will separate `pthread_create(&thread1, NULL, handler, NULL);` from `x = 33;` and run `handler` in that crack, to report `11`. – Weather Vane Oct 07 '18 at 18:55
  • @WeatherVane Yes right you are. – kiran Biradar Oct 07 '18 at 18:58
  • Everywhere you say "out of order" execution you probably wanted to say "memory ordering". Out of order execution is an implementation technique that has only a weak relationship to memory ordering and the effect you're looking for is pretty much a memory ordering one. Implement Dekker's algorithm and you'll see it very quickly. – BeeOnRope Oct 08 '18 at 00:21

1 Answers1

6

The effect you're trying to create is not dependent out-of-order execution. That's only one of the things that can create memory reordering. Plus, modern x86 does out-of-order execution but uses its Memory Order Buffer to ensure that stores commit to L1d / become globally visible in program order. (Because x86's memory model only allows StoreLoad reordering, not StoreStore.)

Memory-reordering is separate from instruction execution reordering, because even in-order CPUs use a store buffer to avoid stalling on cache-miss stores.

Out-of-order instruction execution: is commit order preserved?

Are loads and stores the only instructions that gets reordered?


A C implementation on an in-order ARM CPU could print either 11 or 33, if x and f ended up in different cache lines.


I assume you compiled with optimization disabled, so your compiler effectively treats all your variables volatile, i.e. volatile int x,f. Otherwise the while(f==0); loop will compile to if(f==0) { infloop; }, only checking f once. (Data race UB for non-atomic variables is what allows compilers to hoist loads out of loops, but volatile loads have to always be done. https://electronics.stackexchange.com/questions/387181/mcu-programming-c-o2-optimization-breaks-while-loop#387478).

The stores in the resulting asm / machine code will appear in C source order.

You're compiling for x86, which has a strong memory model: x86 stores are release-stores, and x86 loads are acquire loads. You don't get sequential-consistency, but you get acq_rel for free. (And with un-optimized code, it happens even if you don't ask for it.)

Thus, when compiled without optimization for x86, your program is equivalent to

_Atomic int x, f;

int main(){
    ...
    pthread_create
    atomic_store_explicit(&x, 33, memory_order_release);
    atomic_store_explicit(&f, 1, memory_order_release);
    ...
}

And similarly for the load side. The while(f==0){} is an acquire-load on x86, so having the read side wait until it sees non-zero f guarantees that it also sees x==33.

But if you compiled for a weakly-ordered ISA like ARM or PowerPC, the asm-level memory-ordering guarantees there do allow StoreStore and LoadLoad reordering, so it would be possible for your program to print 11 if compiled without optimization.

See also https://preshing.com/20120930/weak-vs-strong-memory-models/

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847