4

I'm reading a legacy C++ code where memory barrier is defined as below. The main OS are linux and vxworks. The compilers are gcc(WindRiver's gcc).

#if((KCompilerGNU)||(kCompilerWindRiver))
   #define MEMORY_BARRIER   __asm__ volatile("nop\n");
#else
   #define MEMORY_BARRIER   __asm nop;
#endif

But I don't see how a no-op operation works to produce a memory barrier? Or it's just a fault implementation?

Eric Z
  • 14,327
  • 7
  • 45
  • 69
  • you probably got the wrong interpretation of memory barrier. you probably take it to mean *fence*. the person who wrote it probably meant it as a filler. i.e. for padding code to align to certain boundary. – thang Feb 04 '13 at 01:24
  • The legacy code is using it as the purpose of a memory fence in lots of places that you can clearly tell. Btw, how does a nop serve as the alignment purpose anyway? – Eric Z Feb 04 '13 at 01:26
  • it's a filler. if you know your code is 15 bytes and need to pad it to 16 bytes (on x86) before more stuff, sometimes people insert a nop. here: http://stackoverflow.com/questions/234906/whats-the-purpose-of-the-nop-opcode – thang Feb 04 '13 at 01:27

1 Answers1

8

This is a compiler barrier, not a full hardware memory barrier. That is, it is intended to be an opaque call that the compiler can't optimize across, but it doesn't have any effect on the hardware in terms of memory re-ordering1. It may be defined correctly for that purpose if the compilers in question do in fact treat asm blocks as opaque (for example, gcc asm blocks have specific rules for defining exactly what can change across a block, etc).

It may be appropriate to call it a full memory barrier (which usually suppresses both compiler and hardware re-orderings) if you know the hardware this code targets has a strong memory model that never reorders memory operations.


1 That said, such a barrier could still be sufficient in the case that the program is single-threaded or the machine doesn't exhibit interesting reorderings (e.g., a simple in-order, non-speculative CPU or a single-CPU system).

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • But shouldn't we use `asm volatile("" ::: "memory");` for compiler barrier? – Eric Z Feb 04 '13 at 01:36
  • Sure, on gcc, or compilers that support the gcc asm syntax, but I have no idea what compilers the #else case is actually targeting. For all I know the #else case is not targeting gcc but some other compilers which never look inside the asm block at all, and always insert it at its location in the source. If I had my way, the #ifs would cover all compilers explicitly, with a final #ifelse that fails with an "unsupported compiler" warning. I'm not sure how Wind River's gcc asm syntax differs from gcc. You should check whether just volatile is enough, or if clobber lists are required. – BeeOnRope Feb 04 '13 at 01:44
  • FWIW, it seems to me that recent gcc documentation implies that volatile alone is not enough - that you need volatile and [memory] in the clobber list, as you suggested. If this code is old, it may have worked in the past, but may no longer work, etc. Of course, compiler-only barriers are highly suspect anyway if your hardware does any reordering. – BeeOnRope Feb 04 '13 at 01:59
  • So this needs "memory" to block the CPU reordering things like reads/writes? Isn't there a standard C++ command for this? – huseyin tugrul buyukisik Jan 13 '18 at 16:38
  • @huseyintugrulbuyukisik - when you say "this needs to" - are you talking about the code in the OP? What it needs to do depends on what type of memory barrier is expected/needed: perhaps the implementation expects this "only" to be a compiler barrier, in which case the implementation might be OK. If they expected it also to prevent CPU re-ordering, and they are running on a platform that does such reordering, then it is insufficient. Yes, C++ offers standardized barriers these days such as `atomic_thread_fence` and `atomic_signal_fence`, the latter being equivalent to a "compiler barrier". – BeeOnRope Jan 13 '18 at 18:31
  • @BeeOnRope yes I was talking about the code in OP but knowing atomic_thread_fence forces a cpu barrier is good. – huseyin tugrul buyukisik Jan 13 '18 at 18:54
  • @huseyintugrulbuyukisik yes, `atomic_thread_fence` forces a full barrier (CPU, compiler), while `atomic_signal_fence` forces a compiler-only barrier (the "signal" in the name is a hint that it is useful for enforcing signal safety in single-threaded programs, rather than thread-safety). Note that using the new fences is generally discouraged in favor of operations on `std::atomic<>` objects instead: the latter are often easier to reason about and more efficient. – BeeOnRope Jan 13 '18 at 20:13
  • @BeeOnRope Thank you. I was just curious about cpu reordering my single thread non-register Kahan Summation algorithm and breaking it when migrated to another computer or compiler. I was using hand-written asm block sse codes. – huseyin tugrul buyukisik Jan 13 '18 at 21:09
  • @huseyintugrulbuyukisik - if your algorithm is single threaded, you generally never need any fences or any atomic/ordering stuff at all, since the CPU will preserve the expected "program order" from the perspective of the single thread executing the program. Reordering is only visible from another thread (or potentially from a single handler but that applies to few programs). – BeeOnRope Jan 13 '18 at 21:10
  • @BeeOnRope sometimes compiler optimizes out instead of reordering. That needs more compiler dependent flags in program. I used asm sse for that so it would work on sse supporting cpu and compiler(if asm styles match ofcourse). If fence works for multiple threads, it should also work for single thread too ? (against optimizations) – huseyin tugrul buyukisik Jan 13 '18 at 21:12
  • If a compiler optimizes out, then it doesn't have much to do with barriers. I think you need a separate question explaining what you are trying to do, because it no longer seems to have much of a relationship to the OP... – BeeOnRope Jan 13 '18 at 21:15