117

Consider the following function:

void func(bool& flag)
{
    if(!flag) flag=true;
}

It seems to me that if flag has a valid boolean value, this would be equivalent to unconditional setting it to true, like this:

void func(bool& flag)
{
    flag=true;
}

Yet neither gcc nor clang optimize it this way — both generate the following at -O3 optimization level:

_Z4funcRb:
.LFB0:
    .cfi_startproc
    cmp BYTE PTR [rdi], 0
    jne .L1
    mov BYTE PTR [rdi], 1
.L1:
    rep ret

My question is: is it just that the code is too special-case to care to optimize, or are there any good reasons why such optimization would be undesired, given that flag is not a reference to volatile? It seems the only reason which might be is that flag could somehow have a non-true-or-false value without undefined behavior at the point of reading it, but I'm not sure whether this is possible.

200_success
  • 7,286
  • 1
  • 43
  • 74
Ruslan
  • 18,162
  • 8
  • 67
  • 136
  • 8
    Do you have any evidence that it's an "optimization"? – David Schwartz Oct 28 '16 at 20:51
  • 1
    @200_success I don't think that putting a line of code with non-working markup as a title is a good thing. If you want a more specific title, fine *but* choose an *English sentence*, and try to avoid code in it (e.g. *why don't compilers optimize conditional writes to unconditional writes when they can prove them to be equivalent?* or similar). Also since backticks aren't rendered don't use them in the title even if you use code. – Bakuriu Oct 28 '16 at 22:08
  • 2
    @Ruslan, while it doesn't seem to do this optimization for the function itself, when it can inline the code, it does seem to do so for the inlined version. Often just resulting in a compile time constant of `1` being used. https://godbolt.org/g/swe0tc – Evan Teran Oct 29 '16 at 20:41
  • I closed this as a duplicate of a newer question ([Why do none of the major compilers optimize this conditional store that checks if the value is already set?](https://stackoverflow.com/q/76261546)) because I think my answer there covers the same points brought up by the answers here (and other things they didn't), but in more detail and citing references that it's not UB to cast away const, thus compilers must respect the possibility. Perhaps the questions should be merged. – Peter Cordes May 19 '23 at 09:30

3 Answers3

104

This may negatively impact the performance of the program due to cache coherence considerations. Writing to flag each time func() is called would dirty the containing cache line. This will happen regardless of the fact that the value being written exactly matches the bits found at the destination address before the write.


EDIT

hvd has provided another good reason that prevents such an optimization. It is a more compelling argument against the proposed optimization, since it may result in undefined behavior, whereas my (original) answer only addressed performance aspects.

After a little more reflection, I can propose one more example why compilers should be strongly banned - unless they can prove that the transformation is safe for a particular context - from introducing the unconditional write. Consider this code:

const bool foo = true;

int main()
{
    func(const_cast<bool&>(foo));
}

With an unconditional write in func() this definitely triggers undefined behavior (writing to read-only memory will terminate the program, even if the effect of the write would otherwise be a no-op).

Community
  • 1
  • 1
Leon
  • 31,443
  • 4
  • 72
  • 97
  • Good point. Optimisations these days tend to be arbitrations between competing interests - once you include other code around it that may or may not parallelise on the given CPU, it becomes more like Black Magic. – Robinson Oct 28 '16 at 10:45
  • 10
    It may also impact positively on performance since you get rid of a branch. So I don't think this particular case is meaningful to discuss without a very specific system in mind. – Lundin Oct 28 '16 at 10:46
  • In short, on a modern PC OS/system, only the first reason above is a reason not do do this -- causing cache line to dirty. The other reasons are reasons why a *different* target platform wouldn't want to do this. – Yakk - Adam Nevraumont Oct 28 '16 at 13:33
  • 3
    @Yakk behavior definedness is not affected by the target platform. Saying that it will terminate the program is incorrect, but the UB itself can have far-reaching consequences, including nasal demons. – John Dvorak Oct 28 '16 at 14:13
  • 18
    @Yakk That depends on what one means by "read-only memory". No, it's not in a ROM chip, but it is very often in a section loaded to a page that does not have write access enabled, and you will get e.g. a SIGSEGV signal or STATUS_ACCESS_VIOLATION exception when you try to write to it. – Random832 Oct 28 '16 at 14:19
  • 6
    "this definitely triggers undefined behavior". No. Undefined behavior is a property of the abstract machine. It is what the code says that determines whether UB is present. Compilers cannot cause it (though if buggy a compiler can cause programs to behave incorrectly). – Eric M Schmidt Oct 28 '16 at 17:36
  • 2
    @JanDvorak Individual platforms are allowed to define behaviour that is undefined by the standard. Yakk was under the impression that writing to a const object was well-defined on "a modern PC OS/system" – user253751 Oct 29 '16 at 01:22
  • 8
    It is the casting-away of `const` to pass into a function _that may modify the data_ that is the source of the undefined behavior, not the unconditional write. Doctor, it hurts when I do this.... – Spencer Oct 29 '16 at 02:02
  • 2
    @Spencer Even still, a compiler optimization shouldn't break code that doesn't invoke UB. This code is standards compliant as long as `foo` is true. – Navin Oct 29 '16 at 04:56
  • @immibis even then the main headache caused by undefined behavior isn't the occasional SIGSEGV that may happen at runtime. It is that it allows the compilers to rip out large chunks of your code and replace them with anything else or nothing at all. And GCC in particular loves to do that. – John Dvorak Oct 29 '16 at 11:17
  • 1
    You can't "cause" or "invoke" undefined behaviour. If your program _has_ undefined behaviour, it means that you wrote code for which the behaviour is not defined. It's not an event, or a special dance, or a special execution state in the computer. Similarly, you cannot be going around assuming that your computer "will terminate the program". – Lightness Races in Orbit Oct 29 '16 at 16:48
  • 1
    I find that very interesting. Given that cache misses are *phenomenally* expensive, wouldn't it be worthwhile to detect the (very common) case that a write writes exactly the same value as before, and then skip the eviction? – Kilian Foth Oct 30 '16 at 06:30
  • 1
    Isn't the compiler allowed to perform optimizations that are based on the assumption that the program doesn't have UB? I've seen numerous SO questions about code generation that takes advantage of this. – Barmar Nov 01 '16 at 19:22
  • @Barmar Indeed. Besides, the fact here is that the function is declared to take a non-`const` reference. So it should not be passed a reference to a `const`-declared object... _unless_ the caller is completely in control of the implementation and can guarantee it won't try to modify it. Anyone doing otherwise deserves whatever they get. The possibility wasn't mentioned in the question and is not relevant here at all. The idea that compilers should cripple themselves to work around fundamentally broken code is completely absurd, and I can only trust that no compiler makes any such consideration – underscore_d Nov 09 '16 at 23:34
  • 2
    @EricMSchmidt "this definitely triggers undefined behavior" - yes, because the standard says it's undefined. The rest of the sentence is incorrect, because it tries to define the behavior you should expect after the undefined construct. Writing to read-only memory needn't terminate the program. – Mark Ransom Sep 14 '19 at 02:39
  • @Spencer: That's not correct, the code in this answer doesn't have undefined behaviour in C++, since it never tries to modify the `const` object through the non-`const` reference. See [Is it allowed to cast away const on a const-defined object as long as it is not actually modified?](https://stackoverflow.com/q/54504247) / [Why do none of the major compilers optimize this conditional store that checks if the value is already set?](https://stackoverflow.com/q/76261546) – Peter Cordes May 19 '23 at 09:26
  • @PeterCordes Huh? Leon's answer doesn't specify a body for `func` so there's no way you can know that. Notice that the body of `func` in OP **does indeed modify its argument** and Leon asserts there is an "unconditional write" inside `func`. – Spencer May 19 '23 at 14:02
  • @Spencer: With the question's first definition of `func`, where it only writes if it's not already true, there's no UB to call it on `const bool foo = true;`. *If* you called it with the second definition, which does an unconditional store, then of course you'd have UB. It's unfortunate that the question called them both `func`, not `func1` and `func2` or something. Anyway it's only the unconditional store that is UB, not the cast and not passing it to a function that *may* modify it (but doesn't in this case). – Peter Cordes May 19 '23 at 19:35
49

Aside from Leon's answer on performance:

Suppose flag is true. Suppose two threads are constantly calling func(flag). The function as written, in that case, does not store anything to flag, so this should be thread-safe. Two threads do access the same memory, but only to read it. Unconditionally setting flag to true means two different threads would be writing to the same memory. This is not safe, this is unsafe even if the data being written is identical to the data that's already there.

  • But in the same scenario one thread could read while the other is writing. Isn't it UB anyway? – Ruslan Oct 28 '16 at 11:06
  • 10
    I *think* this is a result of applying [`[intro.races]/21`](http://eel.is/c++draft/intro.races#21). – Griwes Oct 28 '16 at 11:08
  • @Ruslan Given that `flag` is `true`, no writing can occur in the first scenario. – molbdnilo Oct 28 '16 at 11:12
  • 10
    Very interesting. So I read this as: The compiler is *never* allowed to "optimize in" a write operation where the abstract machine wouldn't have one. – Martin Ba Oct 28 '16 at 11:24
  • 3
    @MartinBa Mostly so. But if the compiler can prove that it doesn't matter, for example because it can prove that no other thread could possibly have access to that particular variable, then it may be okay. –  Oct 28 '16 at 11:38
  • 14
    This is only unsafe *if the system the compiler is targeting makes it unsafe*. I have never developed on a system where writing `0x01` to a byte that is already `0x01` causes "unsafe" behavior. On a system with word or dword memory access it would; but the optimizer should be aware of this. On a modern PC or phone OS, no problem occurs. So this isn't a valid reason. – Yakk - Adam Nevraumont Oct 28 '16 at 13:31
  • 1
    @Yakk If it is possible to somehow obtain a *valid* non-const reference to a bool variable in read-only memory (of course value of that bool better be true), then this optimization will lead to segfault (not really UB since this is internal compiler stuff and below the level where "UB" is defined), when it shouldn't. I don't know if it is possible on popular desktop or mobile operating systems, though. – hyde Oct 28 '16 at 14:01
  • 1
    @Yakk Disagree with "but the optimizer should be aware of this" for other processors: if it's UB, the optimiser shouldn't need to worry about it. But yeah, thinking about it, for current common processors, including the one the OP was looking at, for data types that are read and written atomically, I agree with that part of your comment. My answer might apply to `void func(long long& ll) { if(ll) ll = 0; }` on x86-32, since `long long` cannot necessarily be written to atomically (it's allowed to be misaligned), I'm not sure, but not to the code in the question. Will probably delete in a bit. –  Oct 28 '16 at 14:21
  • 4
    @Yakk Actually, thinking even more, I think this is right after all, even for common processors. I think you're right when the CPU can write to the memory directly, but suppose `flag` is in a copy-on-write page. Now, at the CPU level, the behaviour might be defined (page fault, let the OS handle it), but at the OS level, it may still be undefined, right? –  Oct 28 '16 at 15:44
  • 1
    @hvd well, defined but surprising. – Yakk - Adam Nevraumont Oct 28 '16 at 17:01
  • @hvd I don't get why that matters here. Care to explain? – Veedrac Oct 29 '16 at 14:39
  • @Veedrac Is that a question about my answer, or about an earlier comment of mine? If the former, if code as written is a valid way of doing what the author wants, and it is compiled into something that doesn't do what the author wants, who else would be to blame but the compiler? If the latter, which comment? –  Oct 29 '16 at 15:11
  • @hvd call me a novice, but why is it unsafe if they're both writing the data that's already there? – Andy Oct 29 '16 at 16:10
  • 1
    @Andy When it comes to concurrent writes, things get tricky, and I'll admit it's not obvious, but generally, it's better to treat it as unsafe unless you can prove it's safe. Initially I was convinced by Yakk's explanation that it would be safe in this specific case, but see [this comment](http://stackoverflow.com/questions/40303182/why-dont-c-compilers-optimize-this-conditional-boolean-assignment-as-an-uncon/40303540#comment67874519_40303540) for why I ended up leaving my answer as it is. –  Oct 29 '16 at 16:20
  • @hvd okay, yup makes sense – Andy Oct 29 '16 at 16:37
  • @hvd [This comment.](http://stackoverflow.com/questions/40303182/why-dont-c-compilers-optimize-this-conditional-boolean-assignment-as-an-uncon/40303540?noredirect=1#comment67874519_40303540) – Veedrac Oct 29 '16 at 17:41
  • @Veedrac Then I don't understand your question. It should, I think, be obvious that if an OS says "don't do X", and code doesn't do "X", then a compiler shouldn't transform it into something that does do "X". That's what I was trying to illustrate, but apparently not clearly enough. –  Oct 29 '16 at 18:24
  • @hvd In what sense is the OS saying "don't do X"? – Veedrac Oct 30 '16 at 00:14
  • @Veedrac That's not relevant here. The compiler cannot know whether the OS is saying "don't do X", therefore it must assume the OS may say that. –  Oct 30 '16 at 00:26
  • @hvd Let me rephrase; in what sense *might* the OS be saying "don't do X"? – Veedrac Oct 30 '16 at 00:31
  • @Veedrac I thought I was clear on that already... An OS might be saying "don't attempt to modify the same memory in a copy-on-write page from two threads in the same process at the same time". –  Oct 30 '16 at 00:37
  • @Veedrac Perhaps, but again, not relevant here since the compiler cannot know that. And this has cost 10 comments when it should have been 2 at most, so please let's not continue. –  Oct 30 '16 at 00:55
  • 1
    @hvd If you're compiling for Linux, and Linux has threadsafe COW, it absolutely can assume COW is threadsafe. Agree to disagree, I suppose. – Veedrac Oct 30 '16 at 01:06
14

I am not sure about the behaviour of C++ here, but in C the memory might change because if the memory contains a non-zero value other than 1, it would remain unchanged with the check, but changed to 1 with the check.

But as I am not very fluent in C++, I don't know if this situation is even possible.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • Would this still be true about `_Bool`? – Ruslan Oct 28 '16 at 12:42
  • 6
    In C, if the memory contains a value that the ABI doesn't say is valid for its type, then it's a trap representation, and reading a trap representation is undefined behaviour. In C++, this could only happen when reading an uninitialised object, and it's reading an uninitialised object that's UB. But if you can find an ABI that says any non-zero value is valid for type `bool`/`_Bool` and means `true`, then in that particular ABI, you're probably right. –  Oct 28 '16 at 13:07
  • 1
    @Ruslan With compilers that use the Itanium ABI, and on ARM processors, C `_Bool` and C++ `bool` are either the same type, or compatible types that follow the same rules. With MSVC, they have the same size and alignment, but there's no official statement about whether they use the same rules. – Justin Time - Reinstate Monica Oct 28 '16 at 16:37
  • 1
    @JustinTime: C's `` includes a `typedef _Bool bool;` And yes, on x86 (at least in the System V ABI), `bool`/`_Bool` are required to be either 0 or 1, with the upper bits of the byte cleared. I don't think this explanation is plausible. – Peter Cordes Nov 01 '16 at 14:07
  • @PeterCordes Yes, C defines `bool` as a typedef of `_Bool`, but that doesn't mean it's always guaranteed to be the same type as C++ `bool`, unless the underlying ABI says that it is (also note that this typedef is in a header, not built into the language itself, because a lot of C code defines `bool` as a typedef of _`int`_). Case in point: `wchar_t` is a typedef for one of the integer types in C, but a distinct character type in C++ (albeit one with the same size and alignment as one of the integer types). – Justin Time - Reinstate Monica Nov 01 '16 at 16:40
  • That said, most ABIs define C++ `bool` and C `_Bool` (and by extension C `bool`) as either the same type, or equivalent types; the odd one out _might_ be whatever ABI Microsoft uses, but there's no way to be certain (evidence _suggests_ it implements them as the same type, but it's possible that what would be valid for one might be a trap representation for the other). It's generally safe to treat them as identical in any C program that includes `stdbool.h`, but it's not safe to assume C `bool` is always the same as C++ `bool` just because they have the same name. – Justin Time - Reinstate Monica Nov 01 '16 at 16:45
  • 1
    @JustinTime: That's true, I should have just pointed out that it definitely does have the same semantics in the all the x86 flavours of the System V ABI, which is what this question was about. (I can tell because the first arg to `func` was passed in RDI, while Windows would use RDX). – Peter Cordes Nov 02 '16 at 00:10
  • @PeterCordes Ah. That makes sense. – Justin Time - Reinstate Monica Nov 02 '16 at 01:18