3

I saw somewhere that the GCC compiler might prefer sometimes not using conditional mov when converting my code into ASM.

What are the cases where it might choose to do something other than conditional mov?

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
Idan C.
  • 31
  • 6
  • 2
    I don't know the specifics for GCC. But for general information, see _Replacing conditional jumps with conditional moves_ in Agner Fog's [x86 optimization guide](https://www.agner.org/optimize/optimizing_assembly.pdf). – Michael Feb 01 '20 at 17:57

1 Answers1

9

Compilers often favour if-conversion to cmov when both sides of the branch are short, especially with a ternary so you always assign a C variable. e.g. if(x) y=bar; sometimes doesn't optimize to CMOV but y = x ? bar : y; does use CMOV more often. Especially when y is an array entry that otherwise wouldn't be touched: introducing a non-atomic RMW of it could create a data-race not present in the source. (Compilers can't invent writes to possibly-shared objects.)

The obvious example of when if-conversion would be legal but obviously not profitable would be when there's a lot of work on both sides of an if/else. e.g. some multiplies and divides, a whole loop, and/or table lookups. Even if gcc can prove that it's safe to run both sides and select one result at the end, it would see that doing that much more work isn't worth avoiding a branch.

If-conversion to a data-dependency (branchless cmov) is only even possible in limited circumstances. e.g. Why is gcc allowed to speculatively load from a struct? shows a case where it can/can't be done. Other cases include doing a memory access that the C abstract machine doesn't, which the compiler can't prove won't fault. Or a non-inline function call that might have side-effects.

See also these questions about getting gcc to use CMOV.

See also Disabling predication in gcc/g++ - apparently gcc -fno-if-conversion -fno-if-conversion2 will disable use of cmov.

For a case where cmov hurts performance, see gcc optimization flag -O3 makes code slower than -O2 - GCC -O3 needs profile-guided optimization to get it right and use a branch for an if that turns out to be highly predictable. GCC -O2 didn't do if-conversion in the first place, even without PGO profiling data.

An example the other way: Is there a good reason why GCC would generate jump to jump just over one cheap instruction?

GCC seemingly misses simple optimization shows a case where a ternary has side-effects in both halves: ternary isn't like CMOV: only one side is even evalutated for side effects.

AVX-512 and Branching shows a Fortran example where GCC needs help from source changes to be able to use branchless SIMD. (Equivalent of scalar CMOV). This is a case of not inventing writes: it can't turn a read/branch into read/maybe-modify/write for elements that source wouldn't have written. If-conversion is usually necessary for auto-vectorization.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • _Compilers can't invent writes to possibly-shared objects_ - doesn't compiler assume that only a write to `std::atomic` variable is a write to a shared variable? A write to a non-atomic variable is data race free only when that variable isn't shared, isn't it? – Maxim Egorushkin Feb 01 '20 at 22:37
  • @MaximEgorushkin: If this thread never touches `arr[5]` (in the C abstract machine), only `arr[4]` and `arr[6]`, then introducing a non-atomic RMW of `arr[5]` isn't allowed. Some other thread might be writing it while our loop runs, and we could step on that write. Inventing writes can create "sharing" (and thus data races) that didn't exist in the source, and thus is disallowed. (Generally also turning a read into a RMW is disallowed, even though if we're reading then it would already be UB for another thread to be writing. Without HW data-race detection it might be ok in theory?) – Peter Cordes Feb 01 '20 at 22:42
  • @PeterCordes - another argument which doesn't involve concurrency is what happens if the memory is mapped read only? Compilers do this in practice (e.g., string constants), and will blow up if you try to write them, even with the same value, so that's an easy way to show this is not allowed without invoking concurrency (which is a more drawn out "proof"). – BeeOnRope Feb 02 '20 at 01:33
  • @BeeOnRope: True, but if the compiler can prove that any part of an array is written, then the whole thing can't be in read-only memory. Or at least a struct. Or is it legal (not UB) to `mprotect` part of an array so part of it is `const` but the rest isn't? Probably with the fairly strict don't-invent-writes behaviour of real compilers like GCC, that's fine. It's potentially undesirable for real-world compilers to invent writes because it could break COW sharing unnecessarily. – Peter Cordes Feb 02 '20 at 04:57
  • @PeterCordes - right, but I don't think they use that reasoning. I don't think you'll find answers to stuff like "is it legal (not UB) ... `mprotect`", sine those things are already outside of the standard. So in that sense, the concurrency thing is perhaps a better reason since the motivation lies inside the standard. Things like "read only" are mostly outside (although the standard does accommodate them e.g., by banning modification of const-defined data). – BeeOnRope Feb 03 '20 at 23:56
  • 1
    So I think the question is more what the compilers want to support. I guess they would support not inventing writes on an element-by-element basis (i.e., the would support the mprotect scenario). Note that icc _does_ invent writes! I can reliably make it crash by passing a `const char *`-defined string to a method vectorized by icc with invented writes. – BeeOnRope Feb 03 '20 at 23:58