Can compiler optimize member access when not used in the interim but updated later

Question

I understand that compiler optimizations can expose already existing bugs (Given that compiler optimizations do not introduce bugs). I also found a similar question: Can compiler optimization introduce bugs? and read a suggested article: Dangerous Optimizations which points to the pitfalls of compiler optimization.

I'm worried that the second check could be optimized away in this code; is that possible?

class Lock {
  void Acquire(bool is_exclusive);
  // Upgrades shared lock to an exclusive lock, if possible. 
  bool TryUpgrade();
  void Release();
}

class Foo {
  bool condition_boolean = true; 
  Lock lock;

  void f1() {
    lock.Acquire(false);
    if (condition_boolean) {
      if (!lock.TryUpgrade()) {
        lock.Release();
        lock.Acquire(true);
      }

      // Recheck the condition as we may have released the lock to 
      // acquire it exclusively.
      if (condition_boolean) {  // Can be optimized away.
        // Do something;
        condition_boolean = false;
      }
    }

    lock.Release();
  }
};

`bool condition_boolean` is a `private` member of `Foo`, so a compiler might be able to prove that only `Foo::f1()` ever accesses it. But it sounds like you're sure that it does get optimized away in some case, perhaps from looking at the asm? Do you have a [mcve] of that, e.g. on https://godbolt.org/? This code doesn't compile since the member functions of `class Lock` are all private, and missing a semicolon (https://godbolt.org/z/qxK3h8v3j), and it won't emit asm for an inline definition of `f1()`. — Peter Cordes, Jul 18 '23 at 01:13
@PeterCordes Compilers can't use `private`ness of members for optimization since there are many subtle ways to access private members from outside the class regardless. — user17732522, Jul 18 '23 at 01:34
@Aman Whether the compiler can optimize away the check depends on what the functions that you are calling are actually doing. Also, optimizations don't introduce bugs. They only make the bugs visible by causing unexpected behavior. Either your program has well-defined behavior and optimizations will not change it or your program has undefined behavior and it is bugged regardless of optimizations, even if the behavior without optimizations seemingly fits your expectations. (Exceptions are actual invalid optimizations and bugs in the compiler, which are rare.) — user17732522, Jul 18 '23 at 01:36
@user17732522: I wondered about that, like whether it would be UB for something to use `memcpy` to replace the object-representation of `f1`'s `*this` with one that had a different value for `condition_boolean`. If not, compilers would have to respect that possibility. — Peter Cordes, Jul 18 '23 at 01:51
The optimizer doesn't introduce bugs. The bugs are in the code. Enable compiler warnings, and address the warnings, to help remove bugs from the code. Use the sanitizers to help find bugs in the code. Use holistic static analysis tools like Coverity to find bad patterns in the code. — Eljay, Jul 18 '23 at 01:52
@PeterCordes Yes, if the class is trivially-copyable, then memcpy is allowed. But even if it wasn't, `private` simply doesn't prevent access. It only prevents naming the member and only in _most_ contexts. One well-known trick to access any private member is through explicit instantiation, which is one of the contexts where accessibility is _not_ checked, and with the help of friend injection and member pointers one can form a function that has access to the private member through the instantiation: https://godbolt.org/z/hedbTqf86 — user17732522, Jul 18 '23 at 02:32
@PeterCordes No, I do not know if it will be optimized away. I am curious to as what will happen. — Aman Deep Gautam, Jul 18 '23 at 05:18
@Eljay I understand the compiler does not introduce bugs. In other words, can optimization expose an already existing bug in the above case? — Aman Deep Gautam, Jul 18 '23 at 05:20
Your phrasing "*I am still not sure why **the following will be optimized away***" implies that you found a case where it is, or you saw someone else claiming that it will happen in this example. Since you don't mean that, rephrase. Perhaps "*I'm worried that the second check could be optimized away in this code; is that possible?*" — Peter Cordes, Jul 18 '23 at 06:00

score 3 · Accepted Answer · answered Jul 18 '23 at 18:41

Assuming that the Lock class is written properly, such that its acquire/release functions contain memory barriers of the corresponding types, then no, the second check cannot be optimized away.

Such optimizations are covered under the C++ standards "as-if rule" [C++20 N4860 intro.abstract p1]. Abstractly, the code must execute exactly as written. But the compiler can apply transformations if it can prove that, assuming the program is otherwise free of undefined behavior, the transformation would not change the observable behavior of the program. (Or, if the program has more than one valid way to execute, then at least it must ensure that it still behaves in one of those valid ways.)

In the case of multithreaded programs, the "assuming the program is otherwise free of undefined behavior" frequently comes into play, applied to data races. A typical example is:

bool b;

void foo() {
    int x=0, y=0;
    if (b)
        x=1;
    if (b)
        y=1;
    assert(x == y);
}

Here, the compiler may assume that b cannot change between the two tests, and optimize out the second one, thus turning the code into if (b) { x=1; y=1; }.

You might say, "but isn't it possible that some other thread modifies b in between?" Ah, well, if that were the case, you would have a data race. The data race rule says that for any two accesses to the same non-atomic variable, where at least one of them is a write, the program must have synchronization to ensure that one of them happens-before the other [intro.races p21]. Synchronization is typically provided by mutexes, or by an acquire load of an atomic variable that reads the value written by an earlier release store.

In the program above, the compiler can see no such synchronization is possible, because in between the two reads of b, the code contains no mutex operations nor any acquire or release operations. Therefore, any concurrent write of b would inevitably be a data race, which would be UB, so the compiler can assume it won't happen. (If the compiler's assumption is violated, then something bad will probably happen - but that's not the compiler's fault, it's your fault for invoking UB.)

However, in the code you posted, there is a way for synchronization to occur. Some other thread could take the lock while you have temporarily released it, modify condition_boolean, and then release the lock. This would be free of data races. Assuming as before that the Lock class is properly implemented, your release of the lock synchronizes with the other thread's acquisition, implying that your first read of condition_boolean happens before the other thread's write. Likewise, the other thread's release synchronizes with your re-acquisition, so the other thread's write happens before your second read.

Thus all the operations on condition_boolean are totally ordered by happens-before, so no data race occurs. Moreover, since the write to condition_boolean happens before the second read, the second read must observe the new value [intro.races p13]. To provide this behavior, the compiler must actually perform the second read; it cannot optimize it out.

Once again, all of this assumes that the Lock class is written correctly. If it is just a wrapper around std::shared_mutex or something like that, then everything should be fine, since the standard defines that std::mutex and friends provide the appropriate acquire and release semantics. If it's something you wrote yourself using std::atomics, then it's on you to ensure that you used a correct algorithm and appropriate std::memory_order to provide synchronization between your unlock and lock routines. If you wrote it without using std::atomic, then almost certainly it is wrong, causing a data race all by itself, and your program is broken regardless of what you do with condition_boolean. (And no, volatile does not substitute for std::atomic.)

Having said all that, even so, the compiler is free to break any of these rules if it can prove that it makes no difference to an otherwise UB-free program. As a simple example, it may be possible to specify via a compiler option that the program will run single-threaded. In this case, the compiler can demote all atomic objects to normal ones, and delete all mutex operations and memory barriers. It could also then delete the second check of condition_boolean. But of course, in such a case, it would be perfectly correct to do so, because condition_boolean in fact cannot change.

Can compiler optimize member access when not used in the interim but updated later

1 Answers1