51

Suppose A, B, a, and b are all variables, and the addresses of A, B, a, and b are all different. Then, for the following code:

A = a;
B = b;

Do the C and C++ standard explicitly require A=a be strictly executed before B=b? Given that the addresses of A, B, a, and b are all different, are compilers allowed to swap the execution sequence of two statements for some purpose such as optimization?

If the answer to my question is different in C and C++, I would like to know both.

Edit: The background of the question is the following. In board game AI design, for optimization people use lock-less shared-hash table, whose correctness strongly depends on the execution order if we do not add volatile restriction.

ACcreator
  • 1,362
  • 1
  • 12
  • 29
  • 19
    Even if the compiler was guaranteed to generate code in that order, the CPU itself does out of order execution. – KitsuneYMG Sep 15 '14 at 11:51
  • 9
    Not only compilers are allowed to do this, CPUs are allowed to do this, memory controllers are allowed to do this, caches are allowed to do this, and so on. – David Schwartz Sep 15 '14 at 12:06
  • I just added an interesting example that shows this practically and not just theoretically. – Shafik Yaghmour Sep 15 '14 at 12:12
  • 9
    Any time you're doing multi-threading you get into another dimension entirely. Even if the code is executed sequentially you have no guarantee (without further controls) that execution will appear sequential as viewed from another processor. If you're trying to do something similar to your shared hash table you need to spend A LOT of time studying up on synchronization issues. – Hot Licks Sep 15 '14 at 12:13
  • Indeed, this is the wrong question. What matters is visibility of the changes to other threads. Cache coherency protocols can, in the absence of memory fences, cause reordering of memory accesses. – Ben Voigt Sep 15 '14 at 14:38
  • @BenVoigt Thanks for your comments, but I don't quite understand. If the two statements did execute in order, would it be possible that at a certain point, another thread might have seen `B` become `b` but haven't seen `A` become 'a'? – ACcreator Sep 15 '14 at 15:00
  • 3
    @ACcreator: Yes, that's possible, depending on the particular cache coherency protocol in use. For example, x86 provides stronger guarantees than Itanium. With multithreading you also need to be concerned about tearing, and speculative writes. – Ben Voigt Sep 15 '14 at 15:01
  • @BenVoigt Thank you. For x86, could in-order execution guarantee in-order visibility of the changes? Apart from using lock, to solve this problem on general platforms (x86, Itanium, arm...), does there exist any better approaches? – ACcreator Sep 15 '14 at 15:09
  • 2
    @ACcreator: Use a memory fence. That's cheaper than a critical section, but still ensures that the caches have to be synchronized in the correct order also. – Ben Voigt Sep 15 '14 at 15:12
  • You see some interesting details and examples in [Memory Reordering Caught in the Act](http://preshing.com/20120515/memory-reordering-caught-in-the-act) which is from the same author from the article I link in my answer. The article in my answer also covers this a little bit as well, I just have not hard the chance to add a multi-threaded example yet. Both articles cover fences and other barriers. – Shafik Yaghmour Sep 15 '14 at 15:13
  • @ShafikYaghmour Thank you. I will go through these materials later :) – ACcreator Sep 15 '14 at 16:34
  • I updated my answer with another article from the same author(*Jeff Preshing*) he just writes the best blog entries on this topic. It should address most of the issues you are dealing with in your application and how to handle them. – Shafik Yaghmour Sep 16 '14 at 02:48

6 Answers6

55

Both standards allow for these instructions to be performed out of order, so long as that does not change observable behaviour. This is known as the as-if rule:

Note that as is pointed out in the comments, what is meant by "observable behaviour" is the observable behaviour of a program with defined behaviour. If your program has undefined behaviour, then the compiler is excused from reasoning about that.

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 3
    Also, neither could be performed if they didn't affect the program's observable behaviour. (i.e. optimized out completely) – M.M Sep 15 '14 at 12:01
  • 2
    It's probably worth pointing out that accessing or modifying a variable only counts as "observable behaviour" if the variable is volatile. – Mike Seymour Sep 15 '14 at 12:02
  • @MikeSeymour I guess with operator overloading in C++, `A = a` could readily have observable behaviour, even for non-volatile variables. – David Heffernan Sep 15 '14 at 12:04
  • 1
    @DavidHeffernan: Yes, I should have been more precise, sorry. I meant "accessing or modifying a fundamentally-typed variable". Of course user-defined operations could have observable behaviour. – Mike Seymour Sep 15 '14 at 12:06
  • 1
    it will certainly change the observable behaviour in poorly written multithreaded programs! LOL! – Gianluca Ghettini Sep 15 '14 at 12:46
  • [It's also worth noting that some types of observable behavior are permitted to be optimized away](http://stackoverflow.com/questions/8890528/copy-constructor-elision), so the "as-if" rule is a little wishy-washy when it comes to constructors/destructors that cause observable behavior. – Cornstalks Sep 15 '14 at 13:25
  • 3
    In this case I think it's worth stressing (as G_G implies) that the "as if" requirement is that the observable behavior *of a program with defined behavior* doesn't change. The questioner must not take this answer to mean that any change of instruction order is guaranteed not change the behavior of his lockless hashtable. In fact if he's asking this question it's because that code contains a data race, so its behavior is not defined and may very well change due to optimizations, accidents of scheduling, and whatnot. – Steve Jessop Sep 15 '14 at 21:11
  • 1
    @Cornstalks copy elision is not performed under the as-if rule; there is specific text that defines copy elision. – M.M Sep 15 '14 at 21:37
  • @MattMcNabb: Ah, that makes sense. I've always heard it explained with the "as-if" rule being the justification, but apparently those explanations must be wrong (or, more likely, I misunderstood what I had read). Thanks. – Cornstalks Sep 15 '14 at 21:41
25

The compiler is only obligated to emulate the observable behavior of a program, so if a re-ordering would not violate that principle then it would be allowed. Assuming the behavior is well defined, if your program contains undefined behavior such as a data race then the behavior of the program will be unpredictable and as commented would require use of some form of synchronization to protect the critical section.

A Useful reference

An interesting article that covers this is Memory Ordering at Compile Time and it says:

The cardinal rule of memory reordering, which is universally followed by compiler developers and CPU vendors, could be phrased as follows:

Thou shalt not modify the behavior of a single-threaded program.

An Example

The article provides a simple program where we can see this reordering:

int A, B;  // Note: static storage duration so initialized to zero

void foo()
{
    A = B + 1;
    B = 0;
}

and shows at higher optimization levels B = 0 is done before A = B + 1, and we can reproduce this result using godbolt, which while using -O3 produces the following (see it live):

movl    $0, B(%rip) #, B
addl    $1, %eax    #, D.1624

Why?

Why does the compiler reorder? The article explains it is exactly the same reason the processor does so, because of complexity of the architecture:

As I mentioned at the start, the compiler modifies the order of memory interactions for the same reason that the processor does it – performance optimization. Such optimizations are a direct consequence of modern CPU complexity.

Standards

In the draft C++ standard this is covered in section 1.9 Program execution which says (emphasis mine going forward):

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5

footnote 5 tells us this is also known as the as-if rule:

This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

the draft C99 and draft C11 standard covers this in section 5.1.2.3 Program execution although we have to go to the index to see that it is called the as-if rule in the C standard as well:

as−if rule, 5.1.2.3

Update on Lock-Free considerations

The article An Introduction to Lock-Free Programming covers this topic well and for the OPs concerns on lock-less shared-hash table implementation this section is probably the most relevant:

Memory Ordering

As the flowchart suggests, any time you do lock-free programming for multicore (or any symmetric multiprocessor), and your environment does not guarantee sequential consistency, you must consider how to prevent memory reordering.

On today’s architectures, the tools to enforce correct memory ordering generally fall into three categories, which prevent both compiler reordering and processor reordering:

  • A lightweight sync or fence instruction, which I’ll talk about in future posts;
  • A full memory fence instruction, which I’ve demonstrated previously;
  • Memory operations which provide acquire or release semantics.

Acquire semantics prevent memory reordering of operations which follow it in program order, and release semantics prevent memory reordering of operations preceding it. These semantics are particularly suitable in cases when there’s a producer/consumer relationship, where one thread publishes some information and the other reads it. I’ll also talk about this more in a future post.

Michael Kohne
  • 11,888
  • 3
  • 47
  • 79
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • it only makes me wonder, why GCC asm produces `addl $1, %eax` instead of `incl %eax`? Even for `a++` it only produces `a += 1` ... ICC behaves as expected though. –  Sep 15 '14 at 16:38
  • 1
    @vaxquis not clear to me, seems like some form of attempted optimization, probably dependent on the assumptions being made by `gcc`. – Shafik Yaghmour Sep 16 '14 at 15:13
3

If there is no dependency of instructions, these may be executed out of order also if final outcome is not affected. You can observe this while debugging a code compiled at higher optimization level.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
1

Since A = a; and B = b; are independent in terms of data dependencies, this should not matter. If there was an output/outcome of previous instruction affecting the subsequent instruction's input, then ordering matters, otherwise not. this is strictly sequential execution normally.

Dr. Debasish Jana
  • 6,980
  • 4
  • 30
  • 69
1

My read is that this is required to work by the C++ standard; however if you're trying to use this for multithreading control, it doesn't work in that context because there is nothing here to guarantee the registers get written to memory in the right order.

As your edit indicates, you are trying to use it exactly where it will not work.

Joshua
  • 40,822
  • 8
  • 72
  • 132
0

It may be of interest that if you do this:

{ A=a, B=b; /*etc*/ }

Note the comma in place of the semi-colon.

Then the C++ specification and any confirming compiler will have to guarantee the execution order because operands of the comma operator are always evaluated left to right. This can indeed be used to prevent the optimizer from subverting your thread synchronization by reordering. The comma effectively becomes a barrier across which reordering is not allowed.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740