Is this a missed optimization opportunity or not

Question

I posted this answer. Code:

#include <atomic>
#include <utility>
void printImpl(...);

std::atomic<bool> printLog = false;

class Log {
 public:
  template <typename T>
  const auto& operator<<(T&& t) {
    if (printLog) {
      ulog.active = true;
      return ulog << std::forward<T>(t);
    } else {
      ulog.active = false;
      return ulog;
    }
  }

 private:
  struct unchecked_log {
    template <typename T>
    const auto& operator<<(T&& t) const {
      if (active) {
        printImpl(std::forward<T>(t));
      }
      return *this;
    }
    bool active{false};
  };
  unchecked_log ulog{};
};

// Instead of the macro. Doesn't break backward compatibility
Log LOG;

void test(bool) { LOG << 1 << 2 << 3 << 4 << 5 << 6 << 7 << 8 << 9 << 10; }

In essence, the code either ignores or logs all the data. The idea was to record the atomic<bool> in a regular bool which can be optimized out more easily. I thought most compilers could easily optimize out the if (active) part since there is no way it can change between calls. Turns out though, most compilers do inline the function call to unchecked_log::operator<< but do not optimize out the branching. Is there something preventing this optimization? Would it be illegal.

`Log::operator <<` depends on `printLog` though, which is a global variable. The compiler will expect that to change, you could use a constant if it really can't. And it's the same `ulog` object both ways, I think the standard might forbid that "optimisation" of `ulog::operator <<`, what if there was two threads for example? — Fire Lancer, Dec 11 '19 at 18:06

walnut · Accepted Answer · 2019-12-11T18:46:59.917

2

LOG is a global variable with external linkage. Therefore printImpl's definition in another translation unit can reach it and can potentially modify LOG.ulog.active between calls.

Make LOG a local variable in test and the repeated checks will be consolidated into one at the entry of test or leave LOG where it is and make it static, so a different compilation unit containing printImpl's definition cannot reach this translation unit's instance.

As mentioned in a comment below, alternatively let operator<< return by copy, so that the instance it returns (now a temporary) is unreachable for printImpl.

Note that the accessibility (private, etc.) of ulog and ulog.active does not matter. As soon as printImpl is able to obtain a pointer or reference to the instance of relevance, private does not protect from modification no matter what. Here are few examples of how that is possible (non-exhaustive):

Calling operator<< on LOG which may now change LOG.ulog.active based on an intervening modification of printLog or by const_casting the result
Calling the defaulted copy/move assignment operators
(since this is a standard-layout class) reinterpret_cast LOG to a reference to its ulog member
(since the classes are trivially copyable) memcpy a different state into LOG
placement-new a new instance of Log into LOG, which will make previous references reference the new object automatically, because Log satisfies the conditions for that
etc.

edited Dec 11 '19 at 18:46

answered Dec 11 '19 at 18:07

walnut

21,629
4
23
59

`LOG` has to be a global variable, but returning `LOG.ulog` by value [does enable the optimization](https://godbolt.org/z/mA-eg5) too. – Aykhan Hagverdili Dec 11 '19 at 18:10
`printImpl` could call `LOG.operator<<` and `const_cast` the result to change it's member. That's why returning by value enables the optimization. But returning by value had nothing to do with the `reinterpret_cast` case you mentioned in the answer. So compilers shouldn't have enabled the optimization? – Aykhan Hagverdili Dec 11 '19 at 18:19
@Ayxan But, every time you write a `const_cast`, a kitten dies. Never do that. `reinterpret_cast` is not much better.. – Jesper Juhl Dec 11 '19 at 18:23
@Ayxan The point is that the *`unchecked_log` instance* returned by-value is not reachable by `printImpl`, it has no way of getting a pointer or reference to it. If it has a pointer or reference, then it basically doesn't matter how you tried to protect yourself with `private` and friends, there are ways to modify members even if you don't have access to their names. (It doesn't matter for this case, I just mentioned it as an aside.) – walnut Dec 11 '19 at 18:23
@walnut I think `reinterpret_cast` method wouldn't work because `Log::unchecked_log` is a private class, so there is no way to name it in `reinterpret_cast(&LOG)`, hence the `const_cast` method was the only thing preventing the optimization. – Aykhan Hagverdili Dec 11 '19 at 18:28
@Ayxan You can `decltype` for it though. Anyway, I believe walnut means that you can `reinterpret_cast` to access the member if you get an instance of `unchecked_log`. – François Andrieux Dec 11 '19 at 18:29
@FrançoisAndrieux You're right. So the `printImpl` could do something like `auto l = LOG << 1; auto& ulog = *(reinterpret_cast(&LOG)); ulog.active = true;` So what compilers did was illegal? – Aykhan Hagverdili Dec 11 '19 at 18:34
@Ayxan If you return by-value or use a `LOG` without external linkage, then `printImpl` has no way of obtaining a reference to the instance of relevance, so it cannot use that. These hypotheticals are for the case where you keep using the global `LOG.ulog` with external linkage. Therefore the optimization is valid. – walnut Dec 11 '19 at 18:36
@walnut In [this](https://godbolt.org/z/cGWYRb) example `LOG` is still a global variable. So `printImpl` could still do all the evil things you mentioned in the answer and I mentioned in [this](https://stackoverflow.com/questions/59291618/is-this-a-missed-optimization-opportunity-or-not#comment104788826_59291693) previous comment. But `if (active)` seems to be optimized out. Why is it optimized then? Am I getting something wrong? I am really sorry if there is something obvious I am missing. – Aykhan Hagverdili Dec 11 '19 at 18:56
1

@Ayxan Yes `printImpl` can modfiy `LOG.ulog.active`, but that is not what you keep using in `test`. The first `<<` returns another `unchecked_log` instance by value, which is then manifested as a temporary in the `LOG << ...` expression in `test` and on which then the other `<<` are called. `printImpl` cannot reach this temporary instance if you don't hand it a reference to it explicitly. – walnut Dec 11 '19 at 19:02
@JesperJuhl I've written `const_cast` when I have a pair of functions, one of which takes a const pointer and returns another const pointer which is related, and another which takes a non-const pointer and returns a similar non-const pointer. The non-const calls the const function, and const-casts the result. No kittens harmed. – Martin Bonner supports Monica Dec 11 '19 at 19:05
@walnut Oh, now I get it. Thank you very much for the effort and time you spent on this question. – Aykhan Hagverdili Dec 11 '19 at 19:06
@MartinBonner Well, as long as none of the objects you cast const off were originally declared as const (can be hard to know) and they are not modified, then there's no UB and the kittens live. I'd still call it questionable practice though. – Jesper Juhl Dec 11 '19 at 19:07
1

@Ayxan Note however that since you are calling `printImpl` first, before constructing the return value, there will actually be two branches (see the assembly). It is not entirely optimal. To make it optimal you should first make a copy of `ulog`, that you then return later or make `ulog` entirely local to the `operator<<` overload of `Log`. – walnut Dec 11 '19 at 19:08

Is this a missed optimization opportunity or not

1 Answers1

Linked