I'm learning relaxed memory models, especially about partial store ordering (PSO).
Many of literature, including academic papers and the Linux kernel document, say that the below pattern is buggy under PSO, because thread1 can reorder instructions so done = 1;
can be executed before x = 1;
.
thread1 { thread2 {
if (done) {
x = 1; if (x==0)
y = 1; ERROR;
done = 1; local = y;
} }
So far so good. It looks very clear.
And then I decided to reproduce the bug in a real machine. Here is the pseudo-code snippet in C (ignoring some pthread APIs' details).
thread 1 thread 2
-------- --------
ptr = (int *)malloc(sizeof(*ptr)); local_ready = ready;
ready = 1; local_ptr = ptr;
if (local_ready)
local_a = *local_ptr;
thread_main
--------
while(true) {
ptr = ready = 0;
pthread_create(&ptr1, thread1);
pthread_create(&ptr2, thread2);
pthread_join(ptr1);
pthread_join(ptr2);
}
All variables with the local_
prefix are local variables, and others are global variables. The main function, in while (true)
loop, keeps initializing ptr
and ready
to 0, creating the two threads and then joining them. I chose Google Pixel 4XL as a testing machine because it is equipped with an aarch64 CPU (I know arm's memory model is more relaxed than PSO but I still believe it should show a similar reordering behavior for the above code snippet). I cross-compiled the program with a toolchain from the Android NDK with -O0
option.
But it never fails (i.e., NULL dereference). I tried again by putting more instructions that write some random values to random global variables at the beginning of the two threads. I hoped it somehow increases the chance of the store-store reordering. But it still never fails.
What am I missing? Is it just because I'm out of luck? or the micro architecture of Google Pixel 4XL's CPU does not implement the store-store reordering? I don't think the buggy pattern is not a real problem because tons of academic papers deal with it (and even the Linux document says it is buggy!). I do believe there is a way to trigger the bug but I can't find it.
Here is the first entry of /proc/cpuinfo
. Is some kind of feature required to trigger the bug?
Processor : AArch64 Processor rev 14 (aarch64)
processor : 0
BogoMIPS : 38.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0xd
CPU part : 0x805
CPU revision : 14