2

From here: https://stackoverflow.com/a/2485177/462608

For thread-safe accesses to shared data, we need a guarantee that
the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)
that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

  • In which cases does compiler stores the value in a register and defers updating main memory? [with respect to the above quote]
  • What is the "re-ordering" that the above quote is talking about? In what cases does it happen?
Community
  • 1
  • 1
Aquarius_Girl
  • 21,790
  • 65
  • 230
  • 411

2 Answers2

2

Q: In which cases does compiler stores the value in a register and defers updating main memory?

A: (This is a broad and open-ended question which is perhaps not very well suited to the stackoverflow format.) The short answer is that whenever the semantics of the source language (C++ per your tags) allow it and the compiler thinks it's profitable.

Q: What is the "re-ordering" that the above quote is talking about?

A: That the compiler and/or CPU issues load and store instructions in an order different from the one dictated by a 1-to-1 translation of the original program source.

Q: In what cases does it happen?

A: For the compiler, similarly to the answer of the first question, anytime the original program semantics allow it and the compiler thinks it's profitable. For the CPU it's similar, the CPU can, depending on the architecture memory model, typically reorder memory accesses as long as the original (single-threaded!) result is identical. For instance, both the compiler and the CPU can try to hoist loads as early as possible, since load latency is often critical for performance.

In order to enforce stricter ordering, e.g. for implementing synchronization primitives, CPU's offer various atomic and/or fence instructions, and compilers may, depending on the compiler and source language, provide ways to prohibit reordering.

janneb
  • 36,249
  • 2
  • 81
  • 97
  • *"(This is a broad and open-ended question which is perhaps not very well suited to the stackoverflow format.)"* That;'s why I presented the quote. It may be answered with reference to the quoted text. – Aquarius_Girl May 28 '12 at 08:34
  • *"The short answer is that whenever the semantics of the source language (C++ per your tags) allow it and the compiler thinks it's profitable."* I meant to ask that "what" happens by storing the value in a register, and later on updating the memory? Why is that method helpful? – Aquarius_Girl May 28 '12 at 08:36
  • *"That the compiler and/or CPU issues load and store instructions in an order different from the one dictated by a 1-to-1 translation of the original program source."* Why would compiler do that? After all doesn't it have to execute the instructions in the given order? – Aquarius_Girl May 28 '12 at 08:37
  • Because register-based architectures are designed such that accessing data in registers is quicker than from memory. Hence an optimizing compiler will try to keep the most used variables in registers. – janneb May 28 '12 at 08:40
  • @AnishaKaul: Wrt your 2nd question, because the compiler tries to generate the fastest possible code, and it thinks it's worth doing so. – janneb May 28 '12 at 08:41
  • So, when the above quote says *"the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)"* Does this mean that for thread safety, we want the compiler to store the value in the register or we don't? [sorry, english problems] – Aquarius_Girl May 28 '12 at 08:48
  • 1
    @AnishaKaul: Well, that depends on what the algorithm in the source code is trying to do, but yes, contents of registers aren't visible outside a processor, so if one wants a variable to be seen by other processors then one must force it to be stored to memory. – janneb May 28 '12 at 08:53
  • *"ontents of registers aren't visible outside a processor"* Thanks, that explains it. volatile is used when we want to tell the compiler that this variable will be modified by some external source. So, when during multithreading on multiple processors at the same time, we may want a thread from an another processor to edit our variable, so we make it volatile by asking the compiler not to store it in the regiaters? Is my understanding correct? – Aquarius_Girl May 28 '12 at 08:58
  • @AnishaKaul: Basically yes, however as the answer by "jalf" you link to says, volatile goes part of the way but isn't sufficient. Hence why e.g. C11/C++11 introduced "atomic" variables, and before that compilers provided various nonstandard extensions to do the same (IIRC MSVC basically overloaded volatile to mean what atomic means in the current standard). – janneb May 28 '12 at 09:08
  • That link also says: *"but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data."* Should I request for a clarification on this in a separate thread? – Aquarius_Girl May 28 '12 at 09:13
  • 1
    @AnishaKaul: I think at this point you'd be best served by a better understanding of the fundamentals. I recommend "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson. – janneb May 28 '12 at 09:25
  • Janeb, I just checked it now. That book sounds promising. I will definitely read it. :) Thanks. but currently I need a clarification of that sentence, reading the whole book right now will take huge time. Never mind I'll search google before asking that statement clarification. ***I could understand your "answer" only by the talks we had in the comments, I request you to put the comment talks in your answer, so that I can select it. In its current form, it isn't very understandable.*** – Aquarius_Girl May 28 '12 at 09:30
0

Well...found this when searching for "volatile" keyword..lol 1. Register access is a lot faster than memory even with cache. For example, if you have things like below:

for(i = 0; i < 10000; i++)
{
// whatever...
}

If variable i is stored in register, the loop gets much better performance. So some of the compiler might generate code that stores i in register. The update to that variable might not happen in memory until the loop ends. It's even totally possible that i is never written to memory (eg, i is never used later) or spilled inside loop body (eg, there is a heavier nested-loop inside to be optimized and no more registers for it). That technique is called register allocation. Generally there is no rules for optimizer as long as language standard allows. There are tons of different algorithms for it. It's hard to answer when it happens. That's why janneb said so. If a variable is not updated in time, for multi-threaded code it might be really bad. For example, if you have code like this:

bool objRead = false;
createThread(prepareObj);  // objReady will be turn on in prepareObj thread.
while(!objReady) Sleep(100);
obj->doSomething();

It's possible that optimizer generates code that tests objReady only once (when control flow enters the loop) since it's not changed inside loop. That's why we need to make sure that read and write really happens as we designed in multi-threaded code.

Reordering is kind of more complicated than register allocation. Both compiler and your CPU might change the execution order of your code.

void prepareObj()
{
obj = &something;
objReady = true;
}

For the prepareObj function point of view, it doesn't matter that whether we set objReady first or set obj pointer first. Compiler and CPU might reverse the order of the two instructions for different reasons like better parallelism on particular CPU pipeline, better data locality for cache hit. You can read the book "Computer Architecture: A Quantitative Approach" suggested by janneb. If my memory serves, appendix A is about reordering(if not, go appendix B or C..lol).

user1192878
  • 704
  • 1
  • 10
  • 20