6

Let's say I define a following C++ object:

class AClass
{
public:
    AClass() : foo(0) {}
    uint32_t getFoo() { return foo; }
    void changeFoo() { foo = 5; }
private:
    uint32_t foo;
} aObject;

The object is shared by two threads, T1 and T2. T1 is constantly calling getFoo() in a loop to obtain a number (which will be always 0 if changeFoo() was not called before). At some point, T2 calls changeFoo() to change it (without any thread synchronization).

Is there any practical chance that the values ever obtained by T1 will be different than 0 or 5 with modern computer architectures and compilers? All the assembler code I investigated so far was using 32-bit memory reads and writes, which seems to save the integrity of the operation.

What about other primitive types?

Practical means that you can give an example of an existing architecture or a standard-compliant compiler where this (or a similar situation with a different code) is theoretically possible. I leave the word modern a bit subjective.


Edit: I can see many people noticing that I should not expect 5 to be read ever. That is perfectly fine to me and I did not say I do (though thanks for pointing this aspect out). My question was more about what kind of data integrity violation can happen with the above code.

Andrew
  • 157
  • 9
  • The last paragraph makes little sense. A **practical** example of something where a certain outcome is **theoretically** possible? I suggest you remove one of those two words. :) – jalf Feb 22 '13 at 09:15
  • It is **theoretically** possible that it will happen on **every** existing architecture with **every** standard-compliant compiler, because a standard-compliant compiler can choose to do whatever it likes when it encounters UB. In practice, compilers tend to be forgiving about UB, but it is certainly *theoretically* possible that they do all sorts of other weird things. – jalf Feb 22 '13 at 09:16
  • @jalf It doesn't really matter here, since the code won't work as expected (that the reading thread will eventually see 5) on any architecture I know (certainly not with VC++ under Windows, g++ under Linux, or Sun CC under Solaris). – James Kanze Feb 22 '13 at 09:18
  • Simply add *volatile* to your variable to avoid optimization surpirses: `volatile uint32_t foo;` – KBart Feb 22 '13 at 09:23
  • @KBart `volatile` helps compiler optimizations, but doesn't do anything about the read and write pipelines on modern processors. (Arguably, it should, but in reality, it doesn't.) – James Kanze Feb 22 '13 at 09:33
  • 1
    James Kanze: `volatile` doesn't help compiler optimization, it hinders it. It directs the compiler that reads and writes cannot be folded / optimized away. The primary purpose is for accessing (memory-mapped) hardware registers. – davmac Feb 22 '13 at 09:53
  • Andrew, your question doesn't make a lot of sense. It's perfectly well defined what object code a particular version of a particular compiler for a particular architecture will produce for any given source code. It's just not defined by the language standard. Could your example ever produce a value other than 0 or 5? probably not with existing compilers on eg x86 architecture. But on some other architecture where concurrent access must be regulated, it certainly could. I know of no such architecture. – davmac Feb 22 '13 at 13:43
  • @davmac: Good! The existence of such compilers/architectures is actually what I want to know. And perhaps some reasonable comments on practical possibility of their existence in future. Since you say you don't know of such an architecture and since nobody mentioned other examples than 16-bit architectures (which are old enough for me) with every hour it makes me more convinced it's practical for me to use a code like above when C++11 features are out of reach at my company (unfortunately). I think I can take the risk. – Andrew Feb 22 '13 at 14:55

7 Answers7

14

In practice, you will not see anything else than 0 or 5 as far as I know (maybe some weird 16 bits architecture with 32 bits int where this is not the case).

However whether you actually see 5 at all is not guaranteed.

Suppose I am the compiler.

I see:

while (aObject.getFoo() == 0) {
    printf("Sleeping");
    sleep(1);
}

I know that:

  • printf cannot change aObject
  • sleep cannot change aObject
  • getFoo does not change aObject (thanks inline definition)

And therefore I can safely transform the code:

while (true) {
    printf("Sleeping");
    sleep(1);
}

Because there is no-one else accessing aObject during this loop, according to the C++ Standard.

That is what undefined behavior means: blown up expectations.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • And then it really boils down to "are you willing to gamble your program's correctness on the assumption that the compiler is not, and will never be, clever enough to make this optimization". :) – jalf Feb 22 '13 at 09:18
  • 1
    But the question is *"Is there any practical chance that the values ever obtained by T1 will be different than 0 or 5 with modern computer architectures and compilers?"*, and the answer is "no", so long as the load and store are atomic. – Ed S. Feb 22 '13 at 09:19
  • 1
    @EdS.: **and** as long as neither of the two are optimized out (because otherwise they won't reach the CPU at all, and then it doesn't matter if they'd be executed atomically or not), but as this answer points out, the optimizer is allowed to optimize it out – jalf Feb 22 '13 at 09:20
  • @jalf: No, optimized or not, the question is *"Is there any practical chance that the values ever obtained by T1 will be **different than 0 or 5** "*. He didn't ask *"will the other thread ever read a 5, or will it always read 0*". He is asking whether or not he will ever read some intermediate value. – Ed S. Feb 22 '13 at 09:24
  • 5
    @EdS.: you might not like it, but this answer brings up a very important point, in that it explains how this code could fail to work *practically speaking*. It gives a plausible reason why a real compiler might break this code, without resorting to "nasal demons" or "ordering pizza" or "the compiler hates you and just wants to spite you". This answer points out why the optimizer might break the code in its quest to make your code fast. I think that's pretty significant. Because it means that even if every compiler today accepts the code, the next version might fail – jalf Feb 22 '13 at 09:24
  • @jalf: I'm not saying it isn't interesting or informative, I'm saying *it does not address the question that the OP actually asked* – Ed S. Feb 22 '13 at 09:25
  • @EdS.: true enough. But he clearly wrote that question under the assumption that "both the values 0 and 5 will be seen", and they might not be – jalf Feb 22 '13 at 09:25
  • @jalf: ...which is why he asked :D – Ed S. Feb 22 '13 at 09:26
  • 5
    @EdS.: no, as you are so keen to point out, he asked *something else*. He assumes that "oh, we'll certainly get the values 0 and 5 eventually", so he instead asked "but will we also get *other* intermediate values". And no, we (practically speaking) won't, but his premise is flawed, because we might never see the value 5 either. So you're right, it doesn't *technically speaking* answer the question, but it answers what he needs to know. Because the assumption that led to the question is wrong. – jalf Feb 22 '13 at 09:27
  • @Matthieu: I guess adding `volatile` qualifier would oblige the compiler not to make such an optimization? – Andrew Feb 22 '13 at 09:31
  • @jalf: No, it gives great technical insight, and not a single yes or no to the answer, which you *assume* comes from a flawed premise. It's great to point out that the compiler can optimize away the assignment, but that information should come *in addition to* answering the question. – Ed S. Feb 22 '13 at 09:37
  • 1
    @Andrew: I am too savvy about `volatile` but I think that yes it would prevent this optimization. However `volatile` was originally meant for hardware access, and therefore there may be issues in the context of multithread related to the order in which writes become apparent to another thread. TL;DR: just use `std::atomic` and let the compiler optimize. – Matthieu M. Feb 22 '13 at 09:44
  • @Andrew it would (writes to a `volatile` variable count as observable behavior, so they may not be optimized away and may not be reordered. But it is overkill. A memory fence or an `std::atomic` would be more appropriate (and no more expensive) – jalf Feb 22 '13 at 09:52
  • @EdS.: Right, a more direct answer would definitely help. I added a couple sentences at the top to make it clear what is answered by this question. – Matthieu M. Feb 22 '13 at 10:01
  • @MatthieuM. its perfectly OK to use `volatile` in context of multithreading, even [MSDN mentions](http://msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.80%29.aspx) that: >The volatile keyword is a type qualifier used to declare that an object can be modified in the program by something such as the operating system, the hardware, or **a concurrently executing thread**. – KBart Feb 22 '13 at 11:01
  • P.S. I'm not saying `volatile` is the best solution here, but it's one of options. – KBart Feb 22 '13 at 11:02
  • 2
    @KBart `volatile` as a sometimes-applicable synchronization primitive in concurrent contexts is a MS extension. using `volatile` as a synchronization primitive is not portable or recommended for modern programs. – justin Feb 22 '13 at 13:03
4

In practice, all mainstream 32-bit architectures perform 32-bit reads and writes atomically. You'll never see anything other than 0 or 5.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
  • 1
    That is wrong. The compiler could change your program so that the changes will never be visible to other threads. – Stephan Dollberg Feb 22 '13 at 08:59
  • 5
    @bamboon: I made two claims. Which of them is false? – Marcelo Cantos Feb 22 '13 at 09:00
  • Sorry, for being imprecise, I meant the later part. Two threads accessing a variable where one is a write and another is a read results in a race-condition which is UB per standard. – Stephan Dollberg Feb 22 '13 at 09:03
  • @bamboon: It's not a race condition if the load and store are atomic, which is exactly what I said in my response. – Ed S. Feb 22 '13 at 09:05
  • 2
    +1 for a practical answer, not based on any religious beliefs. If you store/read variable in an *atomic* operation, there is no race condition and no undefined behaviour. – KBart Feb 22 '13 at 09:10
  • @MarceloCantos You don't say what you mean by "atomically". According to the definition used in the standard, most 32 bit architectures do _not_ perform 32 bit reads and writes atomically. – James Kanze Feb 22 '13 at 09:13
  • @JamesKanze: I'm not sure why this phrase is still coming up: *"According to the definition used in the standard"*. The standard does not perfectly define every scenario. This was explicitly a *practical* question, so the standard + reality should be considered, not one in absence of the other. – Ed S. Feb 22 '13 at 09:15
  • 1
    @EdS. There are two widely used definitions of "atomic" with regards to memory accesses. The standard uses one (the one most used in modern use). No one is claiming that the access will result in any sort of slicing, with another thread seeing some of the bits modified, and not others (especially as he only modifies the bits of a single byte). But that's only part of the problem on a modern processor. A store access from one thread may never be seen in another thread _unless_ that access is followed by a `membar` or some sort of fence instruction. – James Kanze Feb 22 '13 at 09:25
  • @JamesKanze: Since the OP asked if something other than 0 or 5 might pop out, I'd say they were more concerned with slicing than fences. I agree, though, that the fencing issue needs to be considered. – Marcelo Cantos Feb 22 '13 at 09:42
2

In practice (for those who did not read the question), any potential problem boils down to whether or not a store operation for an unsigned int is an atomic operation which, on most (if not all) machines you will likely write code for, it will be.

Note that this is not stated by the standard; it is specific to the architecture you are targeting. I cannot envision a scenario in which a calling thread will red anything other than 0 or 5.

As to the title... I am unaware of varying degrees of "undefined behavior". UB is UB, it is a binary state.

Ed S.
  • 122,712
  • 22
  • 185
  • 265
  • That is wrong. The compiler could change your program so that the changes will never be visible to other threads. – Stephan Dollberg Feb 22 '13 at 08:59
  • @bamboon: Care to explain how? – Ed S. Feb 22 '13 at 09:01
  • Two threads accessing a variable where one is a write and another is a read results in a race-condition which is UB per standard. – Stephan Dollberg Feb 22 '13 at 09:03
  • @bamboon: Umm, no, it's not. That is likely an atomic store on a 32-bit integer, i.e., no race condition. That's what I said, and it is correct – Ed S. Feb 22 '13 at 09:03
  • It is, see standard 1.11.21. (using N3376 here) – Stephan Dollberg Feb 22 '13 at 09:07
  • 3
    First, the access to `foo` is _not_ atomic in the sense the standard uses the term. Second, while I can't imagine a scenario where the reading thread would see anything but `0` or `5`, I can imainge quite a few (including some very, very likely) where the reading thread will never see anything but `0`, regardless of how many times `changeFoo` is called. – James Kanze Feb 22 '13 at 09:07
  • @JamesKanze: Yet in practice, your x86 chip will always load and store that value atomically. You skipped over an important part of the question. So, while the technical, standard-only answer is interesting, it is not what the OP asked for. He asked about what could happen *in practice*. – Ed S. Feb 22 '13 at 09:08
  • 5
    @EdS.: You are forgetting something very important, the *As If* behavior. The compiler is entitled to produce whatever code it wants as long as the behavior is *as if* it had produced a literal translation; the rules that determine whether it's *as if* or not are given by the Standard, and whenever you read *undefined behavior* it means the compiler can suppose it will never happen. Therefore, it can lift `getFoo` out of the loop... if it can prove no call to `changeFoo` ever happens *within this thread*. And then you are damned. – Matthieu M. Feb 22 '13 at 09:09
  • @MatthieuM.: That's a good point, but again, the OP asked if another thread *can ever see anything but a 0 or a 5*. I can't imagine a scenario in which one could. I'm not saying I know everything and there's nothing I haven't considered, but that's my answer. – Ed S. Feb 22 '13 at 09:11
  • @EdS. No. Not even in practice. At least not for the definition of atomic used in the standard (which is the only useful definition in a multithreaded world). – James Kanze Feb 22 '13 at 09:12
  • @JamesKanze: And again you are ignoring the question. This was not a theoretical inquiry, it was specifically asking about practical applications. In that context my answer is correct as far as I am aware. – Ed S. Feb 22 '13 at 09:12
  • @EdS.: he was asking about "practical applications" where another outcome is "theoretically possible". :) – jalf Feb 22 '13 at 09:19
  • @EdS. You're ignoring reality. A store operation of an `int` isn't "atomic" on a Sparc, nor as far as I know on an Intel, or any other modern processor with a memory pipeline or a cache. – James Kanze Feb 22 '13 at 09:20
  • @jalf: Sure, but in practice, it isn't (as long as the load and store are atomic). What I don't get is the answers stating only "UB, anything can happen herp derp". They're not helpful in this context. – Ed S. Feb 22 '13 at 09:21
  • @JamesKanze: it certainly is on Intel CPUs (assuming the `int` is properly aligned). Regardless of pipelining and cache, every core will always either see the write has having been performed, or not having been performed. No inbetween states. That's guaranteed by the ISA :) – jalf Feb 22 '13 at 09:22
  • @JamesKanze: You apparently didn't read my response either. Kudos. And I quote: *Any potential problem boils down to whether or not a store operation for an unsigned int is an atomic operation which, on most (if not all) machines you will likely write code for, it will be"*. So yeah, I already covered that. – Ed S. Feb 22 '13 at 09:22
  • @EdS. And you're ignoring what I'm saying: that a store operation for an unsigned int is _not_ an atomic operation (in the most widely used sense of atomic today) on a modern Intel or Sparc. (I'm less familiar with other processors.) For it to be atomic, you need additional instructions (or a LOCK prefix on an Intel). – James Kanze Feb 22 '13 at 09:26
  • @jalf Not according to the Intel documentation. The Intel processor has fence instructions for a good reason. – James Kanze Feb 22 '13 at 09:31
  • @JamesKanze: ...Not sure where you're getting your information, or perhaps you have a wildly exotic definition for "atomic". Download [this](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html), chapter 8. According to it, atomic store operations include: storing a byte, storing word-aligned word and dword-aligned dword on all x86 CPUs. I honestly don't know what you are talking about. – Ed S. Feb 22 '13 at 09:34
  • @JamesKanze: As I said before, there's nothing wrong with going into more detail, but what is the answer to *"Is there any practical chance that the values ever obtained by T1 will be different than 0 or 5"*. The answer is "no", which is all that I have ever said through these comments. Ok, I have got to go to bed now, it's late. I will say this; The OP has a wealth of information to go over. – Ed S. Feb 22 '13 at 09:38
  • @JamesKanze yes it does, but that has nothing to do with atomicity. word-sized reads and writes are guaranteed to be atomic. But they are not all guaranteed to be performed in order, unless you insert these fence instructions. Those are completely separate issues. Fences are for forcing ordering between instructions, it does nothing about atomicity. Atomicity is guaranteed for any well-aligned word-sized read/write. – jalf Feb 22 '13 at 09:46
  • @EdS. They seem to be contradicting themselves (or using a very weak definition of "atomic"); in chapter 8.2, they say more or less exactly what I've been saying, that accesses are _not_ guaranteed to be atomic, in the sense the word is used in the C++ standard, and that it is usually used when talking about multithreaded issues. (There is a clear semantic shift here, because atomic didn't used to have this stronger meaning.) – James Kanze Feb 22 '13 at 10:23
  • @jalf In the C++ standard, and in most recent books about threading, "atomic" is used to imply ordering (and guaranteed execution). This is a clear semantic shift; it would probably have been better if some other term had been used, but ordering _is_ part of the modern meaning of "atomic". In particular, it's quite clear that unless there is a fence instruction, other threads are not guaranteed to see your write. – James Kanze Feb 22 '13 at 10:26
  • @EdS. Anyway, I think our disagreement is more about the meaning of the word "atomic", than what actually happens at a hardware level. (Since you've posted a link to the same document I base my claims on.) And in practice: the reading process will never see anything but 0 or 5, but it may never see the 5. And also, on a PC, there's enough other stuff going on that the OS will eventually do a context switch, in which case, _it_ will emit a fence instruction, so the other thread will eventually see a 5 (but maybe only seconds after it was written). – James Kanze Feb 22 '13 at 10:30
  • @JamesKanze: what. I'm sorry, but no. The `std::atomic` template has broader semantics than just guaranteeing atomicity, but that does not change the semantic of the word "atomicity" in general. Where does the C++ standard define the concept of something being "atomic" (without talking about `std::atomic` specifically) the way you describe? – jalf Feb 22 '13 at 12:53
  • @jalf The name C++ uses was chosen because the semantics correspond to the most widely used meaning of atomic today (at least with regard to memory access). – James Kanze Feb 22 '13 at 13:40
2

Not sure what you're looking for. On most modern architectures, there is a very distinct possibility that getFoo() always returns 0, even after changeFoo has been called. With just about any decent compiler, it's almost guaranteed that getFoo(), will always return the same value, regardless of any calls to changeFoo, if it is called in a tight loop.

Of course, in any real program, there will be other reads and writes, which will be totally unsynchronized with regards to the changes in foo.

And finally, there are 16 bit processors, and there may also be a possibility with some compilers that the uint32_t isn't aligned, so that the accesses won't be atomic. (Of course, you're only changing bits in one of the bytes, so this might not be an issue.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • +1 here, it does answer the question (I didn't see that you posted an answer during all of the commenting). – Ed S. Feb 22 '13 at 09:41
2

Is there any practical chance that the values ever obtained by T1 will be different than 0 or 5 with modern computer architectures and compilers? What about other primitive types?

Sure - there is no guarantee that the entire data will be written and read in an atomic manner. In practice, you may end up with a read which occurred during a partial write. What may be interrupted, and when that happens depends on several variables. So in practice, the results could easily vary as size and alignment of types vary. Naturally, that variance may also be introduced as your program moves from platform to platform and as ABIs change. Furthermore, observable results may vary as optimizations are added and other types/abstractions are introduced. A compiler is free to optimize away much of your program; perhaps completely, depending of the scope of the instance (yet another variable which is not considered in the OP).

Beyond optimizers, compilers, and hardware specific pipelines: The kernel can even affect the manner in which this memory region is handled. Does your program Guarantee where the memory of each object resides? Probably not. Your object's memory may exist on separate virtual memory pages -- what steps does your program take to ensure the memory is read and written in a consistent manner for all platforms/kernels? (none, apparently)

In short: If you cannot play by the rules defined by the abstract machine, you should not use the interface of said abstract machine (e.g. you should just understand and use assembly if the specification of C++'s abstract machine is truly inadequate for your needs -- highly improbable).

All the assembler code I investigated so far was using 32-bit memory reads and writes, which seems to save the integrity of the operation.

That's a very shallow definition of "integrity". All you have is (pseudo-)sequential consistency. As well, the compiler needs only to behave as if in such a scenario -- which is far from strict consistency. The shallow expectation means that even if the compiler actually made no breaking optimization and performed reads and writes in accordance with some ideal or intention, that the result would be practically useless -- your program would observe changes typically 'long' after its occurrence.

The subject remains irrelevant, given what specifically you can Guarantee.

justin
  • 104,054
  • 14
  • 179
  • 226
  • 1
    The last sentence is the very important one. – Stephan Dollberg Feb 22 '13 at 09:48
  • @justin: I can't agree with the last paragraph: Imagine a case (and this was actually the root of my question) where I have several threads that perform tasks in a loops and they count (=>write) the number of executions just for the statistics and I want to display (=>read) this number to see statistics. It doesn't statistically matter to me whether the number is 1000000 or 1000042. If unsynchronized write-read can possibly make the program crash, it IS an issue for me. But if it just makes me see fake number, I can live with it. So the result COULD be practically useful. – Andrew Feb 22 '13 at 10:31
  • @Andrew How does that make sense? Why would you want to create statistics which could be totally wrong. What would the purpose be of creating false data? They could potentially differ by numbers much larger than 42. – Stephan Dollberg Feb 22 '13 at 10:37
  • @Andrew most obvious case in that scenario: since you are not guaranteed strict consistency, read and write operations to the shared counter are ultimately divisible. therefore, the accumulator value will be exact only in very specific scenarios (scenarios you cannot guarantee using the abstract machine in combination with sequential consistency alone). illustration: say you have 15 threads and they all perform 1 million operations/increments, sharing one counter - consider it a miracle (read: purely chance) if your program's operation counter consistently reaches 15 million. – justin Feb 22 '13 at 11:27
  • @Andrew that also means that the accuracy of the counter can vary dramatically as the compilers, hardware, optimizations, machine's workload during execution, etc. changes. so it could be 99% accurate in one run, and 96% accurate the next time you run it. – justin Feb 22 '13 at 11:35
  • @bamboon, sure they can but it's me (human who reads the statistic) to decide what to do with this knowledge. None of the part of the program's execution depends on this value. – Andrew Feb 22 '13 at 11:42
  • Another _usefulness_ example: I want to refresh a cashed value no more often that 1 mln loops (loops are done by multiple threads). Even if the actual refresh happens after 50 mln it is still acceptable (though I can start thinking of changing the OS because its scheduler apparently sucks). – Andrew Feb 22 '13 at 11:57
1

Undefined behavior means that the compiler can do what ever he wants. He could basically change your program to do what ever he likes, e.g. order a pizza.

See, @Matthieu M. answer for a less sarcastic version than this one. I won't delete this as I think the comments are important for the discussion.

Stephan Dollberg
  • 32,985
  • 16
  • 81
  • 107
  • This addresses the title of the question, but not the question itself. – Ed S. Feb 22 '13 at 09:00
  • It does, because the rest is irrelevant in the presence of UB. – Stephan Dollberg Feb 22 '13 at 09:01
  • Two threads accessing a variable where one is a write and another is a read. – Stephan Dollberg Feb 22 '13 at 09:02
  • Are you aware of atomic operations? It depends on the processor architecture. – Ed S. Feb 22 '13 at 09:04
  • Are you under the impression that using `std::atomic` is the only way to perform atomic operations? I write plenty of code that will only ever be run on architectures which guarantee atomic reads and writes of 32-bit integers. I don't use `std::atomic` for those. – Ed S. Feb 22 '13 at 09:06
  • Yes, that is the one and only way to do it without invoking undefined behavior. – Stephan Dollberg Feb 22 '13 at 09:06
  • No, no it's not. The processor can guarantee that. You are referring to a *potential* race condition, dependent upon the target architecture. – Ed S. Feb 22 '13 at 09:06
  • @bamboon: This was not a 'what is undefined behavior' question. Besides, though it is (is it?) an undefined behavior for the program, the OS and computer architecture place quite a good constrains on what a program that has gone mad can do. Thus, it won't order a pizza and it is well-defined that it won't. – Andrew Feb 22 '13 at 09:07
  • @EdS.: the processor can guarantee *nothing*, if the compiler is allowed to change the behavior of your code. If something is undefined, then it is *theoretically* possible that the compiler will make it do absolutely anything, and there is nothing the CPU can do about that – jalf Feb 22 '13 at 09:10
  • 4
    @Andrew: much the same applies here. If it is UB, then it is UB, and the compiler can *in theory* just generate just generate whatever code it likes. Now, unless you are running the program with *very* restrictive privileges ,then it is most likely able to make HTTP requests. Which means that it could theoretically order pizza, and do so with the OS's blessing. :) – jalf Feb 22 '13 at 09:12
  • @EdS. Do you know what "atomic access" means? On most modern architectures, you'll need some sort of fence or memory barrier for the access to be guaranteed; I know of no compiler which would generate one with the above code. – James Kanze Feb 22 '13 at 09:15
  • @JamesKanze: Where is that "smiley banging his head against a wall" emote when you need one...? – Ed S. Feb 22 '13 at 09:15
  • @JamesKanze: How about this; answer the question. Describe a scenario in which a calling thread will read a value other than 0 or 5 on architectures on which 32-bit load and stores are atomic. I will be eagerly anticipating your response. – Ed S. Feb 22 '13 at 09:16
  • @jalf: Indeed, I agree. I did not consider compiler being able to go mad because of detecting a logic leading to undefined behavior. Yet, show me a one that would ;) – Andrew Feb 22 '13 at 12:05
0

Undefined behavior is guaranteed to be as undefined as the word undefined.
Technically, the observable behavior is pointless because it is simply undefined behavior, the compiler is not needed to show you any particular behavior. It may work as you think it should or not or may burn your computer, anything and everything is possible.

Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • I don't remember any specs saying that 'race condition' implies 'undefined behavior' in a way the term is used by the specs. Or does it? – Andrew Feb 22 '13 at 08:59
  • 5
    @Andrew Unfortunately, yes. [intro.multithread]§21: The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. – Angew is no longer proud of SO Feb 22 '13 at 09:03
  • I have trouble finding this part in pre-C++11 specifications. It means that your answer is correct only for those compilers that comply to C++11 (and provided that I activate C++11 features like in gcc). For those that don't, this is not undefined behavior. But maybe I took a wrong document, did I? – Andrew Feb 22 '13 at 09:38
  • 1
    @Andrew Before C++11 the standard wasn't aware of multithreading at all which means that basically any multithreading was UB or lets rather say implementation defined to the specific implementation you were using(e.g. pthreads). – Stephan Dollberg Feb 22 '13 at 09:42
  • @Andrew: Pre C++11 standards do not have any mention of multithreading. Pre C++11 standards provide some guarantees on simultaneous access for objects but not in a elaborated way. The answer though is true for both pre C++11 as well as C++11 compliant compilers. – Alok Save Feb 22 '13 at 09:44