9

I am looking at an open source C++ project which has the following code structure:

while(true) {
  // Do something work

  if(some_condition_becomes_true)
     break;

  __asm volatile ("pause" ::: "memory");
}

What does the last statement do? I understand that __asm means that it is an assembly instruction and I found some posts about pause instruction which say that the thread effectively hints the core to release resources and give other thread more resources (in context of hyper-threading). But what does ::: do and what does memory do?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
John Elaine
  • 359
  • 5
  • 22
  • Inline assembler like that is very compiler-specific. You might want to read some documentation of the compiler to find out. From the syntax I assume it's the [GCC compiler](https://gcc.gnu.org/onlinedocs/). – Some programmer dude May 19 '18 at 18:52
  • 2
    It is the gcc format for [extended assembly](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html). The `:` are just separators between (unused) optional arguments. – Bo Persson May 19 '18 at 18:55
  • 3
    the "memory" part tells gcc that memory may have been modified inside the asm block. While it's not true (`pause` doesn't affect memory content), it's used as partial anti-reorder barrier for the compiler. Together with `volatile` keyword it's almost sure the `pause` will be at the end of the while-body and nothing from the code ahead will be moved after the `pause`, actually it may even prevent the optimizer to unroll the loop (or at least make it less likely). Which look as something what the author probably did want. – Ped7g May 19 '18 at 19:15
  • @Ped7g You should probably turn this into an answer... – hlt May 19 '18 at 19:59
  • Not a duplicate, but the pause is discussed in a related question: https://stackoverflow.com/questions/4725676/how-does-x86-pause-instruction-work-in-spinlock-and-can-it-be-used-in-other-sc – Michael Petch May 19 '18 at 20:14
  • 1
    @hlt I didn't want, because I don't fully understand extended inline assembly in gcc (and all the implications caused by that line in the source), I'm more like x86 assembly master, but I use `gcc` only as C and C++ compiler, the inline asm stuff is very tricky and most often pointless (using separate asm source file is often better), although in this particular case the usage looks reasonable. I still guessed there was some C/C++ way of achieving that, like `_mm_pause()` in the answer from Peter Cordes, but I don't know these. So in the end the better answer appeared. :) – Ped7g May 20 '18 at 06:32
  • @BoPersson The link in your comment shows `__asm__` being used, not `__asm`. – AJM Jan 24 '23 at 16:23

1 Answers1

10

It's _mm_pause() and a compile memory barrier wrapped into one GNU C Extended ASM statement. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

asm("" ::: "memory") prevents compile-time reordering of memory operations across it, like C++11 std::atomic_signal_fence(std::memory_order_seq_cst). (not atomic_thread_fence; although on x86 preventing compile-time reordering is sufficient to make it an acquire + release fence because the only run-time reordering that x86 allows is StoreLoad.) See Jeff Preshing's Memory Ordering at Compile Time article.

Making the asm instruction part non-empty also means those asm instructions will run every time the C logically runs that source line (because it's volatile).

pause prevents speculative loads from causing memory-ordering mis-speculation pipeling clears (aka machine nukes). It's useful inside spin loops that are waiting to see a value in memory.

You might find this statement inside a spinloop written without C++11 std::atomic, to tell the compiler it has to re-read the value of a global variable. (Because the "memory" clobber means the compiler has to assume the asm statement might have modified the value of any globally-reachable memory.)

This looks like the context where you found it: some_condition_becomes_true probably includes reading a non-atomic / non-volatile global.

The C++11 equivalent of your loop:

#include <atomic>
#include <immintrin.h>
std::atomic<int> flag;

void wait_for_flag(void) {
    while(flag.load(std::memory_order_seq_cst == 0) {
        _mm_pause();
    }
}

(Not exactly equivalent, because your version has a full compiler barrier while mine only has a seq-cst load, so it's not a full signal-fence. But probably what wasn't needed, and they just used something stronger than necessary to get the effect of volatile).


Without the barrier or making flag atomic, the compiler would have optimized it to:

// Do something work

if(some_condition_becomes_true) {
    // empty
} else {

  while(true) {
     // Do something work
     __asm volatile ("pause" ::: );  // no memory clobber
  }
}

i.e. it would hoist the check on some_condition_becomes_true out of the loop and not re-read the global every time.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847