1

Suppose I have code such as

#include <stdint.h>

namespace
{
struct Thing {
    uint8_t a;
    uint8_t b;
    uint8_t c;
    uint8_t d;
};

auto& magicRegister = *reinterpret_cast<volatile uint8_t*>(0x1234);
auto magicMemory = reinterpret_cast<Thing*>(0x2000);  // 0x2000 - 0x20ff
}

int main()
{
    magicMemory[0] = { .a = 1, .b = 2, .c = 3, .d = 4 };
    magicRegister = 0xff;   // Transfer

    magicMemory[0].d = 5;
    magicRegister = 0xff;   // Transfer
}

in which writing 0xff to the byte located in address 0x1234 makes the system copy 256 bytes from the region starting at 0x2000 to an inaccessible memory. The contents of the memory region only matter when the transfer takes place.

How can I guarantee that GCC's optimization won't mess up the order of operations?


I see a few options.

  1. Make magicMemory volatile: I dislike this, because the contents of the memory region don't matter until the transfer is initiated.
  2. Add asm("" ::: "memory"); before each transfer: simple and effective, but may prevent some unrelated code from being properly optimized.
int main()
{
    magicMemory[0] = { .a = 1, .b = 2, .c = 3, .d = 4 };
    asm("" ::: "memory");
    magicRegister = 0xff;   // Transfer

    magicMemory[0].d = 5;
    asm("" ::: "memory");
    magicRegister = 0xff;   // Transfer
}
  1. Same as #2, but add
asm(""
    :
    : "m"(*reinterpret_cast<uint8_t*>(0x2000)),
      "m"(*reinterpret_cast<uint8_t*>(0x2001)),
      "m"(*reinterpret_cast<uint8_t*>(0x2002)),
      // ...
      "m"(*reinterpret_cast<uint8_t*>(0x20fd)),
      "m"(*reinterpret_cast<uint8_t*>(0x20fe)),
      "m"(*reinterpret_cast<uint8_t*>(0x20ff)));

instead: stupid, but technically should work. Sadly, it crashes the GCC port I'm using.


EDIT: What if I also had this:

uint8_t unrelatedVariable;

int main()
{
    unrelatedVariable = 5;   // This is redundant and should be optimized away

    // same as before

    unrelatedVariable = 6;
}

It is possible to fix the ordering of accesses to magicRegister and magicMemory while not affecting unrelatedVariable?

old_timer
  • 69,149
  • 8
  • 89
  • 168
LHLaurini
  • 1,737
  • 17
  • 31

1 Answers1

2

The right way to do this is to use memory barrier or atomics to ensure the consistency of writes. Indeed, not only GCC can reorder writes, but most modern processors can also reorder memory instructions (although many of them do not reorder multiple writes).

Memory barriers establishes memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, without an associated atomic operation. Here is how to use them in your case:

int main()
{
    magicMemory[0] = { .a = 1, .b = 2, .c = 3, .d = 4 };

    // No reads or writes in the current thread can be reordered after this
    // Also prevents all preceding writes from moving past all subsequent stores.
    std::atomic_thread_fence(std::memory_order_release);

    // Here we are sure that magicMemory has been written
    magicRegister = 0xff;   // Transfer

    // This barrier is needed not to update magicMemory before the tranfer
    std::atomic_thread_fence(std::memory_order_release);

    magicMemory[0].d = 5;

    // Same as before
    std::atomic_thread_fence(std::memory_order_release);

    magicRegister = 0xff;   // Transfer
}

Note that if you need to also read data between the writes, then it is probably wise to use the std::memory_order_acq_rel ordering for the memory barriers.

You can find information about memory ordering here.

C++ atomics can help you if you need to write the Thing structure in one shot and to control the consistency of the write before the transfer. However, unless you use C++20, you cannot make atomic operation on any memory area (atomic reference).

Finally, yes, volatile does not really match with your expectations (the compiler cannot reorder volatile writes, but the processor still can, like some PowerPC processors). Here is what the standard says about it:

Within a thread of execution, accesses (reads and writes) through volatile glvalues cannot be reordered past observable side-effects (including other volatile accesses) that are sequenced-before or sequenced-after within the same thread, but this order is not guaranteed to be observed by another thread, since volatile access does not establish inter-thread synchronization.
In addition, volatile accesses are not atomic (concurrent read and write is a data race) and do not order memory (non-volatile memory accesses may be freely reordered around the volatile access).

You can also find additional information about the interaction between volatile accesses, memory fences and CPU writes reordering in this post and this post.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • Thank you for your answer. As I understand, that will also affect unrelated memory accesses, right? Is it possible for it to only affect `magicRegister` and `magicMemory`? See the example I added to my question. – LHLaurini Mar 21 '21 at 13:53
  • Yes, that will also affect unrelated memory accesses. Concerning the local effect: AFAIK, not for the writes in this cases (for reads, you can play with atomic references and the `std::memory_order_consume` despite it is discouraged). Actually, `unrelatedVariable` and `magicMemory` should be optimized differently in your case since `magicRegister` and `magicMemory` are implicitly linked due to their special location in memory, but the compiler is not aware of that. I doubt there is a generic way (ie. hardware independent) to make the compiler know this (even specifically GCC). – Jérôme Richard Mar 21 '21 at 15:17