0

Given below code, will CPU reorder STORE a and STORE b ? From code logic, a and b are independent.

int* __attribute__ ((noinline)) GetMemAddr(int index) {
    static int data[10];
    return &data[0];
}

void fun() {
    int *a=GetMemAddr(1); //use different args to get same address to avoid optimiztion
    int *b=GetMemAddr(2);
    *a=1;
    *b=3;
}
Leeor
  • 19,260
  • 5
  • 56
  • 87
xYZ
  • 107
  • 1
  • 9
  • 1
    C doesn't run directly on CPUs, it has to be compiled to asm. But no, there's no way the stores could reorder at compile time, or else they'd leave the wrong value long term. And at run-time, no I don't think any ISA exists that would let another thread observe `3` temporarily, then `1`, then `3` again as the final long-term value. But any C program that tried to check would have undefined behaviour, because `int data[]` is not `_Atomic`. From within the same thread (or signal handlers in that thread), your program will always see its own operations happen in program order. – Peter Cordes Dec 19 '18 at 05:30
  • You need to provide a specific architecture, but assuming you mean x86 or most common ones - no, stores do not reorder with other stores. Unless you use explicitly weakly ordered stores. – Leeor Dec 19 '18 at 06:01
  • 1
    https://stackoverflow.com/questions/1474030/how-can-i-tell-gcc-not-to-inline-a-function - `noinline` does nothing to guarantee that your function is called at all. GCC can very well see what it does, even if it is not inlined. It doesn't have *side-effects* because `data[]` is not volatile. – Antti Haapala -- Слава Україні Dec 19 '18 at 06:10
  • 1
    @Leeor: stores to the *same* address never reorder on x86, even weakly-ordered. (And I think on any sane architecture, because program logic dictates the final long-term value of the bytes in memory, so all cores have to see that one last unless there's temporary flip-flopping). – Peter Cordes Dec 19 '18 at 06:29
  • 1
    @PeterCordes, yes, I was talking about the general case of 2 stores, as the rule says. In the example above the matching address would be detected and block (or possibly merge) the 2nd store. – Leeor Dec 19 '18 at 06:42

2 Answers2

3

Your question is pretty much pointless as it is now.

int* __attribute__ ((noinline)) GetMemAddr(int index) {
    static int data[10];
    return &data[0];
}

void fun() {
    int *a=GetMemAddr(1); //use different args to get same address to avoid optimiztion
    int *b=GetMemAddr(2);
    *a=1;
    *b=3;
}

this compiled with GCC 7.3 and -O3 elides the first call to GetMemAddr completely because it doesn't have side effects. It elides the assignment *a=1 too. noinline means that the function must not be inlined. It doesn't mean that it needs to be called at all.

The only proper way to actually avoid elision is to declare a and b as volatile int *s. That way the stores are also to be kept in order. However, it is still not in any way guaranteed that these stores be atomic, so another thread can see funny things happening - for those you need to use the C11 atomic features, or a compiler extension/guarantee.

1

The CPU can re-order the two stores so long as it doesn't violate any guarantee that the CPU has to provide. It's the compiler's job to generate code for the CPU that doesn't allow it to make optimizations that cause the generated code to violate the C standard. To put it another way, a C compiler takes code that it promises will run according to the rules of the C standard and turns it in to assembly code that will in fact run according to the rules of the C standard by relying on the CPU to run according to the rules of its architecture specification.

So if those two stores happen to be to the same memory location, no "optimization" is possible that permits the wrong result to be observed from that very same thread. That would violate the C standard, so a non-broken compiler will generate whatever code it takes for a non-broken CPU to not do that. However, nothing prevents optimizations that might cause other threads to see strange intermediary results unless your platform's threading standard says something different (and none that I know of do).

If you want cross-thread guarantees, you have to use atomic operations, mutexes, or whatever else your platform provides for that. If you just want code that happens to work, you'll need platform-specific knowledge about what optimizations your platform is actually capable of and what methods there are to disable them on that platform (volatile, compiler flags, whatever).

David Schwartz
  • 179,497
  • 17
  • 214
  • 278