4

I would like to create a function that always returns zero, but this fact should not be obvious to the optimizer, so that subsequent calculations using the value won't constant-fold away due to the "known zero" status.

In the absence of link-time optimization, this is generally as simple as putting this in its own compilation unit:

int zero() {
  return 0;
}

The optimizer can't see across units, so the always-zero nature of this function won't be discovered.

However, I need something that works with LTO and with as many possible future clever optimizations as well. I considered reading from a global:

int x;

int zero() {
  return x;
}

... but it seems to me that a sufficiently smart compiler could notice that x is never written to and still decide zero() is always zero.

I considered using a volatile, like:

int zero() {
  volatile int x = 0;
  return x;
}

... but the actual semantics of the required side effects of volatile reads aren't exactly clear, and would not seem to exclude the possibility that the function still returns zero.

Such an always-zero-but-not-at-compile-time value is useful in several scenarios, such as forcing a no-op dependency between two values. Something like: a += b & zero() causes a to depend on b in the final binary, but doesn't change the value of a.

Don't answer this by telling me the "standard doesn't guarantee any way to do this" - I'm well aware and I'm looking for a practical answer and not language from the standard.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • I am curious now: why do you need to force a dependency between a and b? – coredump Jul 23 '18 at 13:02
  • 1
    @coredump - it lets you measure the latency of a function, as opposed to its throughout. For example, if the function returns `a`, and the next call takes `b` as input, making `b` depend on `a` will measure the latency by making the function calls serially dependent. – BeeOnRope Jul 23 '18 at 18:42
  • One way is by loading the value from one of the constant segment registers (e.g., DS) into a GPR and then subtract the constant to get zero. Segment registers are OS-dependent and so the compiler can never optimize that out. See [this](https://github.com/torvalds/linux/blob/050e9baa9dc9fbd9ce2b27f0056990fc9e0a08a0/arch/x86/include/asm/segment.h) for how the DS is initialized on 32-bit and 64-bit Linux. – Hadi Brais Jul 23 '18 at 20:14
  • @HadiBrais - sure, but if I wanted to do some platform specific x86 thing, I could just use inline assembly to do `mov reg, 0` or `xor reg, reg` which would work fine. I'm looking for something more or less in standard C++ that should work (the "it will work" part not being guaranteed by the standard, but the code used should be portable/standard). – BeeOnRope Jul 23 '18 at 20:17
  • Does that work? GCC might analyze the inline assembly and figure it out. See [https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile). – Hadi Brais Jul 23 '18 at 20:19
  • @HadiBrais - gcc never examines the inline assembly, it gets copied "as is" after variable replacement (formatting warts and all) into the assembly file _if it is copied at all_. The analysis it does is _outside_ the assembly function: without `volatile` it will treat the assembly as an function solely of its declared inputs and solely affecting its declared outputs, which means it can eliminate the assembly if the outputs are dead, if it is called repeatedly with the same inputs, etc. – BeeOnRope Jul 23 '18 at 20:24

4 Answers4

2

I would be amazed if a compiler can figure this out:

int not_a_zero_honest_guv()
{
    // static makes sure the initialization code only gets called once
    static int const i = std::ifstream("") ? 1:0;
    return i;
}

int main()
{
    std::cout << not_a_zero_honest_guv();
}

This uses a complex, (unpredictable) runtime initialization of a function local static. If the naughty little compiler figures out that an empty filename will always fail, then put some illegal filename in there.

Galik
  • 47,303
  • 4
  • 80
  • 117
  • `rand() != 0` would be something in the same vein.... almost-always-zero but the compiler definitely can't assume – M.M Jul 23 '18 at 07:17
  • This is a good one, although it's a bit of a shame to rely on file-system behavior. – BeeOnRope Jul 23 '18 at 20:17
  • Just curious, why wrap the `ifstream` call in a lambda? – BeeOnRope Jul 23 '18 at 20:25
  • @M.M - well `rand()` is _sometimes_ zero, and on a system with a small `RAND_MAX` maybe not even all that infrequently. I had considered `rand() >= 0`, which should always be true, but of course it's possible that the compiler understands it. Another approach would be to pass `rand()` into some function that a compiler can't easily prove returns zero, e.g., test the collatz conjecture or something. – BeeOnRope Jul 23 '18 at 20:28
  • @BeeOnRope Because I originally had a `std::fstream` variable and an `if()`construct. When I condensed it I didn't realise the lambda was no longer necessary. – Galik Jul 23 '18 at 20:49
  • @M.M I did consider a random number generator but if an implementation provides an inline version it is possible the compiler will figure out its value. – Galik Jul 23 '18 at 20:54
  • @BeeOnRope You could probably think of other functions that are opaque to the compiler because they call down to the operating system. Incidentally on Unix like OSs you can use `static int const i = []{ int i; std::ifstream("/dev/zero") >> i; return i; }();`, there maybe something similar to `/dev/zero` on Windows? – Galik Jul 23 '18 at 21:02
  • @Galik - yeah I'm trying to think of a function that calculates something that is likely to be opaque to the compiler, e.g., the number of factors of a number, then I feed it random input like `rand()` or `clock()` and check compare the result against a value I know can never occur. I'm stuck on the "likely to be opaque" part. – BeeOnRope Jul 23 '18 at 21:24
  • @BeeOnRope I am sure a compiler would not follow a `reinterpret_cast` like this: `std::uint32_t u = 0x48484848; static int const i = [&]{ int i; std::istringstream(std::string(reinterpret_cast(&u), 4)) >> i; return i; }();` – Galik Jul 23 '18 at 21:58
2

First an aside: I believe that the OP's third suggestion:

int zero() {
  volatile int x = 0;
  return x;
}

would in fact work (but this is not my answer; see below). This exact same function two weeks ago was the subject of Is it allowed for a compiler to optimize away a local volatile variable?, with much discussion and differing opinions, which I will not repeat here. But for a recent test of this, see https://godbolt.org/g/SA7k5P.


My answer is to add a static to the above, namely:

int zero() {
  static volatile int x;
  return x;
}

See some tests here: https://godbolt.org/g/qzWYJt.

Now with the addition of static, the abstract concept of "observable behavior" becomes more believable. With a little bit of work, I could figure out the address of x, especially if I disabled Address space layout randomization. This would probably be in the .bss segment. Then with a bit more work I could attach a debugger/hacking tool to the running process and then change the value of x. And with volatile, I have told the compiler that I might do this, so it is not allowed to change this "observable behavior" by optimizing x away. (It could perhaps optimize the call to zero away by inlining, but I don't care.)

The title of Is it allowed for a compiler to optimize away a local volatile variable? is a bit misleading, as the discussion centred on x being on the stack rather than it being a local variable. So is not applicable here. But we could change x from local scope to file scope or even global scope, as in:

volatile int x;
int zero() {
  return x;
}

This would not change my argument.


Further discussion:

Yes, volatile's are sometimes problematic: for example, see the pointer-to-volatile issues shown here https://godbolt.org/g/s6JhpL and in Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?.

And yes, sometimes (always?) compilers have bugs.

But I would like to argue that this solution is not an edge case, and that there is a consensus among compiler writers, and I will do so by looking at existing analyses.

John Regehr's 2010 blogpost Volatile Structs Are Broken reports a bug where a volatile access was optimized away in both gcc and Clang. (It was fixed in three hours.) One commentator quoted the standard (emphasis added):

"6.7.3 ... What constitutes an access to an object that has volatile-qualified type is implementation-defined."

Regehr agreed, but added that there is consensus in how it should work on non-edge cases:

Yes, what constitutes an access to a volatile variable is implementation defined. But you have missed the fact that all reasonable C implementations consider a read from a volatile variable to be a read access and a write to a volatile variable to be a write access.

For further references. see:

These are reports about compiler bugs and programmers' errors. But they show how volatile should/does work, and that this answer meets those norms.

Joseph Quinsey
  • 9,553
  • 10
  • 54
  • 77
  • **Edit:** The OP ended their question with 'Don't answer this by telling me the "standard doesn't guarantee any way to do this" - I'm well aware and I'm looking for a practical answer and not language from the standard.' I think this more than meets this requirement. – Joseph Quinsey Aug 05 '18 at 03:25
0

You'll find that each compiler has an extension for achieving this.

GCC:

__attribute__((noinline))
int zero()
{
    return 0;
}

MSVC:

__declspec(noinline)
int zero()
{
    return 0;
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • It's not guaranteed and actually already doesn't work on `clang` today (despite that clang generally respects `noinline`), and gcc already does the type of inter-procedure analysis that could break this although it doesn't do it yet. – BeeOnRope Jul 23 '18 at 06:15
  • @BeeOnRope you're right. That should probably be reported as a bug. – Richard Hodges Jul 23 '18 at 06:30
  • does this attribute also have the semantic that the result of the function cannot be used for optimization? In the gcc manual it only describes the behaviour that it can't be inlined. (Which is orthogonal to the optimization issue) – M.M Jul 23 '18 at 07:20
  • For example `if ( !zero() ) foo();`, the compiler could (to my understanding) translate it as `zero(); foo();` in the presence of this attribute – M.M Jul 23 '18 at 07:22
  • @M.M Interesting. Neither gcc or msvc do this. – Richard Hodges Jul 23 '18 at 09:24
0

On clang and gcc, clobbering a variable works, but imposes some overhead

int zero()
{
    int i = 0;
    asm volatile(""::"g"(&i):"memory");
    return i;
}

which under O3 on gcc gets compiled to

    mov     DWORD PTR [rsp-4], 0
    lea     rax, [rsp-4]
    mov     eax, DWORD PTR [rsp-4]
    ret

and on clang

    mov     dword ptr [rsp - 12], 0
    lea     rax, [rsp - 12]
    mov     qword ptr [rsp - 8], rax
    mov     eax, dword ptr [rsp - 12]
    ret

Live.

Passer By
  • 19,325
  • 6
  • 49
  • 96