6

I am trying to introduce a generic function with the semantics of ternary operator: E1 ? E2 : E3. I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition for the ternary operator. However GCC misses this optimization in case of ternary function call (even when E2/E3 have no side effects).

In the listing below function ternary is written to behave similarly to the ternary operator. However GCC emits potentially heavy call to function f which seems can be eliminated for some input values (exactly how it is done for ternary operator) because f is declared with pure attribute - please look at the godbolt link for assembly code generated by GCC.

Is it something that could be improved in GCC (room for optimization) or does the C++ standard explicitly prohibit such kind of optimizations?

// Very heavy function
int f() __attribute__ ((pure));

inline int ternary(bool cond, int n1, int n2) {
    return cond ? n1 : n2;
}

int foo1(int i) {
    return i == 0 ? f() : 0;
}

int foo2(int i) {
    return ternary(i == 0, f(), 0);
}

Assembly listing with -O3 -std=c++11:

foo1(int):
  test edi, edi
  jne .L2
  jmp f()
.L2:
  xor eax, eax
  ret
foo2(int):
  push rbx
  mov ebx, edi
  call f()
  test ebx, ebx
  mov edx, 0
  pop rbx
  cmovne eax, edx
  ret

https://godbolt.org/z/HfpNzo

eXXXXXXXXXXX2
  • 1,540
  • 1
  • 18
  • 32
  • 2
    It "works" with `__attribute__ ((const))`. – melpomene Sep 13 '18 at 19:46
  • @melpomene: `__attribute__ ((const))` means the function can't even read globals, while `__attribute__((pure))` means it can read globals but has no side-effects. (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes). So I think it's a missed optimization to not optimize away the call, because we know it has no side-effects. And it works in the `foo1` case, but that's because the C abstract machine doesn't evaluate `f()` in the not-used side of the ternary. – Peter Cordes Sep 13 '18 at 19:49
  • 2
    @eXX: you should probably report this as a keyword=`missed-optimization` bug on GCC's bugzilla. https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc. – Peter Cordes Sep 13 '18 at 19:50
  • *"or does the C++ standard explicitly prohibit such kind of optimizations?"* The standard allows for any optimization provided the resulting executable's behavior is equivalent to an unoptimized one. See [The as-if rule](https://en.cppreference.com/w/cpp/language/as_if). – François Andrieux Sep 13 '18 at 19:51
  • 1
    Heh. If I change `foo1` to `int t = f(); return i == 0 ? t : 0;`, GCC doesn't optimize it (it always calls `f`), but then it compiles `foo2` into `jmp foo1`. – melpomene Sep 13 '18 at 19:53
  • 3
    One thing that will make this tricky is that there is absolutely no indication that the function call is expensive, and there *is* an indication that *not* calling the function is expensive: that requires an extra branch. –  Sep 13 '18 at 19:58
  • @hvd good point, but isnt "more than one brach" are reasonable estimate for the cost of a function call? – 463035818_is_not_an_ai Sep 13 '18 at 20:03
  • @user463035818 Indeed, if *some* reasonable estimate has been chosen by the GCC developers, then I would expect this optimisation to be performed. But I would not have any idea how a reasonable estimate would have been chosen. –  Sep 13 '18 at 20:06
  • Interesting that the latest clang does not optimize it even with const attribute: https://godbolt.org/z/58Vnt6 – eXXXXXXXXXXX2 Sep 13 '18 at 20:12
  • In the call to `foo2()` the language requires that all parameters to a function are evaluated before the function is called. **EVEN** if the function is inlined the compiler is forced to evaluate all those expressions as a requirement of the language. Now the compiler can then apply the `as-if` rule after the fact and try and optimize away the call to `f()` if there are no side affects but this is not really that simple an optimization. So your comparison is apples to oranges. – Martin York Sep 13 '18 at 21:24
  • What happens if you use lazy parameter evaluation using lambdas? `foo3(i, [](){return f();}, [](){return 0;}); – Martin York Sep 13 '18 at 21:28

1 Answers1

7

I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition (as long as E2/E3 has no side effects) for the ternary operator.

The compiler doesn't eliminate it; it just never optimizes it into a cmov in the first place. The C++ abstract machine doesn't evaluate the not-used side of the ternary operator.

int a, b;
void foo(int sel) {
    sel ? a++ : b++;
}

compiles like so (Godbolt):

foo(int):
    test    edi, edi
    je      .L2                # if(sel==0) goto
    add     DWORD PTR a[rip], 1   # ++a
    ret
.L2:
    add     DWORD PTR b[rip], 1   # ++b
    ret

The ternary operator can only optimize to an asm cmov if neither input has any side-effects. Otherwise they're not exactly equivalent.


In the C++ abstract machine (i.e. the input to gcc's optimizer), your foo2 does always call f(), while your foo1 doesn't. It's no surprise that foo1 compiles the way it does.

For foo2 to compile that way, it would have to optimize away the call to f(). It's always called to create an arg for ternary().


There is a missed-optimization here, which you should report on GCC's bugzilla (use the missed-optimization keyword as a tag). https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc

A call to int f() __attribute__ ((pure)); should be able to be optimized away. It can read globals, but must not have any side effects. (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html)

As @melpomene discovered in comments, int f() __attribute__ ((const)); does give you the optimization you're looking for. An __attribute__((const)) function can't even read globals, only its args. (Thus with no args, it must always return a constant.)

HVD points out that gcc doesn't have any cost info for f(). Even if it could have optimized away the call to ((pure)) f() as well as to ((const)) f, maybe it didn't because it didn't know it was more expensive than a conditional branch? Possibly compiling with profile-guided optimization would convince gcc to do something?

But given that it made the call to ((const)) f conditional in foo2, gcc may just not know that it can optimize away calls to ((pure)) functions? Maybe it can only CSE them (if no globals have been written), but not optimize away entirely from a basic block? Or maybe the current optimizer just fails to take advantage. Like I said, looks like a missed-opt bug.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847