1

I want to ensure the following three statements execute in the exact order specified:

auto t1 = std::chrono::steady_clock::now(); // Statement 1
auto t2 = std::chrono::system_clock::now(); // Statement 2
auto t3 = std::chrono::steady_clock::now(); // Statement 3

The compiler (or processor) is free to reorder these statements as there is no data dependency. See https://stackoverflow.com/a/38025837/1520427

C++11 added std::atomic_signal_fence to "establish memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, between a thread and a signal handler executed on the same thread." However, according to cppreference, "no CPU instructions for memory ordering are issued" so I'm unclear how this would stop the processor from reordering things.

My questions:

Is the following code sufficient to stop the compiler from reordering statements (assuming it has local definitions for all of the code)?

auto t1 = std::chrono::steady_clock::now(); // Statement 1
std::atomic_signal_fence(std::memory_order::memory_order_release);
auto t2 = std::chrono::system_clock::now(); // Statement 2
std::atomic_signal_fence(std::memory_order::memory_order_release);
auto t3 = std::chrono::steady_clock::now(); // Statement 3
std::atomic_signal_fence(std::memory_order::memory_order_release);

Is the processor free to rearrange the order of these operations? e.g. can it execute them 2-1-3? If so, would std::atomic_thread_fence prevent it?

Do I need to introduce an artificial data dependency (as in the linked question) to get the intended behaviour?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
user1520427
  • 1,345
  • 1
  • 15
  • 27
  • The idea of a compiler reordering arbitrary operations is crazy on its face. – curiousguy Oct 26 '19 at 06:15
  • 1
    > *compiler (or processor) is free to reorder these statements* Says who? The declarations are sequenced and the embedded expressions in them have side effects. – Kaz Oct 26 '19 at 06:18

1 Answers1

1

You don't need to do anything to block compile-time reordering; non-inline function calls are black boxes that might interact with each other through global variables, therefore the compiler can't reorder them.

Or if std::chrono::steady_clock::now(); can fully inline (perhaps using inline asm to read a timestamp), a correct implementation of now() will use something like a volatile access, or GNU C asm volatile to make sure it can't reorder with other now() calls. (And more importantly to make sure it can't CSE and be hoisted out of a loop, resulting in the illusion of everything taking zero time).

Unlike in the question you linked, the things you care about ordering are not simple computations like z = x + y; They're special function calls to what are normally library functions. I didn't check the specs, but I wouldn't be surprised if time-getting functions have some kind of rule about being ordered wrt. each other. Certainly a good-quality implementation would want to do that for you.


Is the processor free to rearrange the order of these operations?

This is semi-plausible. Unlikely on real implementations, usually now() runs quite a few instructions, comparable in size to the out-of-order execution window. (e.g. ROB size of 224 uops on Skylake. One rdtsc is 20 uops alone, and there's a bunch of scaling work).

OoO exec is normally done on an oldest-ready-first basis, so multiple repeats of the same now() function are unlikely to exec out of order.

If system_clock and steady_clock use a totally different now, and now doesn't do any barriering itself, you might want to use an implementation-specific mechanism to block OoO exec. e.g. on x86, _mm_lfence().

e.g. if system_clock has a low-overhead now that just reads a volatile memory location (in a page exported by the kernel, updated by an interrupt handler), but steady_clock::now uses rdtsc, reordering is plausible. But there's no portable way to stop it.


However, according to cppreference, "no CPU instructions for memory ordering are issued" so I'm unclear how this (atomic_signal_fence) would stop the processor from reordering things.

It doesn't. That's not the point and not what it's for. Out-of-order execution makes sure to preserve the illusion (for a single thread) that it ran in program order.

Therefore atomic_signal_fence only needs to make sure asm program order matches C++ abstract-machine source order for signal-handlers that run in the same thread (or interrupt handlers on the same core) to observe this thread's operations happening in program order. Or vice versa, for stores done by a signal handler.

You are correct that your attempt wouldn't work. It could only possibly help on an (IMHO broken) implementation that allowed compile-time reordering of now() functions, and then probably only as a side-effect of how atomic_signal_fence() might be defined. e.g. as something like GNU C asm volatile("":::"memory"). Although if now() was broken and uses a non-volatile asm statement (so multiple now() calls could CSE with each other), an asm volatile statement wouldn't order them.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • "non-inline function calls" are not necessarily black boxes. Whole-program optimizers see a lot, and some compilers have tables of what standard library calls do. – Arch D. Robison Oct 25 '19 at 02:31
  • 1
    @ArchD.Robison: Yes, my phrasing was an oversimplification; IPO (inter-procedural optimization) can happen even without actual inlining. But *most* non-template library functions are opaque, only the most critical ones like `sqrt`, `printf`, and of course `memcpy`, `strcpy` and so on, get treated as compiler built-ins. But however an implementation works, I'd consider it broken if it does in practice reorder calls to its own standard library `now()` functions. – Peter Cordes Oct 25 '19 at 02:43
  • @ArchD.Robison: Whole-program / link-time optimization doesn't usually happen into libraries; they're usually not built as fat binaries with machine code + intermediate-representation like LLVM bytecode or GCC GIMPLE. – Peter Cordes Oct 25 '19 at 02:45
  • 1
    Intermediate representation is not required. All that is required is a summary of side effects. I think I've seen a closed-source compiler build such summaries for its own libraries. But I concur that a compiler that did not respect `now()` would be considered broken. – Arch D. Robison Oct 26 '19 at 04:40
  • @ArchD.Robison: Interesting; with gcc and clang it's either full LTO or nothing. But yeah that makes sense, a bit of pureness or symbols read/written metadata could allow optimizations. (And a list of actually-clobbered registers could also be useful to callers) – Peter Cordes Oct 26 '19 at 04:45