2

I wrote a multi-thread program to demonstrate the out of order effect of Intel processor. The program is attached at the end of this post. The expected result should be that when x is printed out as 42 or 0 by the handler1. However, the actual result is always 42, which means that the out of order effect does not happen.

I compiled the program with the command "gcc -pthread -O0 out-of-order-test.c" I run the compiled program on Ubuntu 12.04 LTS (Linux kernel 3.8.0-29-generic) on Intel IvyBridge processor Intel(R) Xeon(R) CPU E5-1650 v2.

Does anyone know what I should do to see the out of order effect?

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int f = 0, x = 0;

void* handler1(void *data)
{
    while (f == 0);
    // Memory fence required here
    printf("%d\n", x);
}

void* handler2(void *data)
{
    x = 42;
    // Memory fence required here
    f = 1;
}

int main(int argc, char argv[])
{
    pthread_t tid1, tid2;

    pthread_create(&tid1, NULL, handler1, NULL);
    pthread_create(&tid2, NULL, handler2, NULL);

    sleep(1);
    return 0;
}
cadaniluk
  • 15,027
  • 2
  • 39
  • 67
Mike
  • 1,841
  • 2
  • 18
  • 34
  • 2
    That is not about out-of-order, but a race-condition. (x86 is an in-order load/store architecture, btw.). – too honest for this site Nov 21 '15 at 22:17
  • @Olaf, Thanks for your comment. However, x86 at least has load-after-store issue according to http://stackoverflow.com/questions/7346893/out-of-order-execution-and-memory-fences . If it has the data dependence on one core, I know that the hardware will keep the order. Otherwise, the out-of-order mechanism may execute the next instruction before the previous instruction. – Mike Nov 21 '15 at 22:55
  • 1
    out of order here is not about instruction execution anyway. Hardly possibl for a multi-issue architecture like (almost) all high-end architectures. This is about load/store. But you will not exploit anythig here with your coe. And likely not with any C code - at least not reliably. The latter simply because you need a specific instruction sequence, which you will not have control of when using a compiler. So, dive into assembler and try. Good luck. – too honest for this site Nov 21 '15 at 23:10
  • 1
    You're looking for http://preshing.com/20120515/memory-reordering-caught-in-the-act. Jeff Preshing's code will demonstrate StoreLoad reordering on x86 (the only kind that's possible on x86). You will never see StoreStore reordering on x86, because it's strongly-ordered. Every store has release semantics, and every load has acquire semantics. Read the other posts on Preshing's blog to understand what that means. – Peter Cordes Nov 22 '15 at 07:00
  • 1
    On a weakly-ordered architecture like ARM or PPC, your code *could* observer f=1 without x=42, but you should test in a loop. Just testing once is extremely unlikely to find anything. Both stores will probably be globally visible before the second thread starts up. (And thread startup will probably run a memory barrier instruction at some point anyway!) – Peter Cordes Nov 22 '15 at 07:03

2 Answers2

6

You are mixing the race condition with an out-of-order execution paradigm. Unfortunately I am pretty sure you cannot "expose" the out-of-order execution as it is explicitly designed and implemented in such a way as to shield you (the running program and its data) from its effects.

More specifically: the out-of-order execution takes place "inside" a CPU in its full entirety. The results of out-of-order instructions are not directly posted to the register file but are instead queued up to preserve the order. So even if the instructions themselves are executed out of order (based on various rules that primarily ensure that those instructions can be run independently of each other) their results are always re-ordered to be in a correct sequence as is expected by an outside observer.

What your program does is: it tries (very crudely) to simulate a race condition in which you hope to see the assignment of f to be done ahead of x and at the same time you hope to have a context switch happen exactly at that very moment and you assume the new thread will be scheduled on the very same CPU core as the other one. However, as I have explained above - even if you do get lucky enough to hit all the listed conditions (schedule a second thread right after f assignment but before the x assignment and have the new thread scheduled on the very same CPU core) - which is in itself is an extremely low probability event - even then all you really expose is a potential race condition, but not an out-of-order execution.

Sorry to disappoint you but your program won't help you with observing the out-of-order execution effects. At least not with a high enough probability as to be practical.

You may read a bit more about out-of-order execution here: http://courses.cs.washington.edu/courses/csep548/06au/lectures/introOOO.pdf

UPDATE Having given it some thought I think you could go for modifying the instructions on a fly in hopes of exposing the out-of-order execution. But even then I'm afraid this approach will fail as the new "updated" instruction won't be correctly reflected in the CPU's pipeline. What I mean is: the CPU will most likely have had already fetched and parsed the instruction you are about to modify so what will be executed will no longer match the content of the memory word (even the one in the CPU's L1 cache). But this technique, assuming it can help you, requires some advanced programming directly in Assembly and will require your code running at the highest privilege level (ring 0). I would recommend an extreme caution with writing self-modifying code as it has a great potential for side-effects.

YePhIcK
  • 5,816
  • 2
  • 27
  • 52
  • @Yephlck, Thank you so much for your detailed explanation and correction! I got your point now: CPU will do the out of order inside its pipeline, but when it expose the result to the outside, it is always reordered to be in the original sequence. Based on this truth, I think my program will "never", instead of a super small probability to, expose the out-of-order effect, because f is always assigned after x is assigned in handler2(). Am I correct? – Mike Nov 21 '15 at 23:42
4

PLEASE NOTE: The following only addresses MEMORY reordering. To my knowledge you cannot observe out-of-order execution outside the pipeline, since that would constitute a failure of the CPU to adhere to its interface. (eg: you should tell Intel, it would be a bug). Specifically, there would have to be a failure in the reorder buffer and instruction retirement bookkeeping.

According to Intel's documentation (specifically Volume 3A, section 8.2.3.4):

The Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location.

It also specifies (I'm summarizing, but all of this is available in section 8.2 Memory Ordering with examples in 8.2.3) that loads are never reordered with loads, stores are never reordered with stores, and stores and never reordered with earlier loads. This means there are implicit fences (3 of the weak types) between these operations in Intel 64.

To observe memory reordering, you just need to implement that example with sufficient carefulness to actually observe the effects. Here is a link to a full implementation I did that demonstrates this. (I will follow up with more details in the accompanying post here).

Essentially the first thread (processor_0 from the example) does this:

    x = 1;
#if CPU_FENCE
    __cpu_fence();
#endif
    r1 = y;

inside of a while loop in its own thread (pinned to a CPU using SCHED_FIFO:99).

The second (observer, in my demo) does this:

    y = 1;
#if CPU_FENCE
    __cpu_fence();
#endif
    r2 = x;

also in a while loop in its own thread with the same scheduler settings.

Reorders are checked for like this (exactly as specified in the example):

if (r1 == 0 and r2 == 0)
++reorders;

With the CPU_FENCE disabled, this is what I see:

[  0][myles][~/projects/...](master) sudo ./build/ooo
after 100000 attempts, 754 reorders observed

With the CPU_FENCE enabled (which uses the "heavyweight" mfence instruction) I see:

[  0][myles][~/projects/...](master) sudo ./build/ooo
after 100000 attempts, 0 reorders observed

I hope this clarifies things for you!

Myles Hathcock
  • 443
  • 2
  • 11