Why does GCC prefer the AVX version of FP instructions?

Question

When compiling for CPUs that have AVX (such as with -march=sandy-bridge), GCC seems to always prefer the AVX versions of simple, scalar floating-point instructions over the SSE versions. Such as, it uses vmulsd instead of mulsd.

I'm wondering, are there any particular performance-related reasons for this, or is it just some implementation detail of GCC that makes it easier/more natural for it to schedule such instructions? From what I can tell from the sources I have (mostly Agner's instruction tables), the AVX and SSE instructions seem to be equal in performance. I realize that AVX instructions are three-operand, but GCC seems to almost always only use the same destination register as one of the source operands anyway.

The VEX encoded version clears the upper bits thereby reducing dependency. — Jester, Jun 01 '16 at 16:14
Bad things can happen if 256-bit VEX instructions get mixed with SSE ones. Turning on AVX in the compiler forces everything to VEX so you can safely mix scalar, 128-bit, and 256-bit code with no issues. — Mysticial, Jun 01 '16 at 16:17
Also, GCC shouldn't always be using the same register for destination and source. Maybe if optimizations are disabled, or if the code doesn't need to use the same operand twice. Try something like `a1 = a0 + b0; b1 = a0 - b0;`. At least one of them needs to be done out-of-place. — Mysticial, Jun 01 '16 at 16:20
http://stackoverflow.com/questions/7839925/using-avx-cpu-instructions-poor-performance-without-archavx/7841251#7841251 — Mysticial, Jun 01 '16 at 16:27
@Mysticial: That certainly explains it. The linked page apparently also contains the explanation for the behavior, which is an interesting read. Thanks! — Dolda2000, Jun 01 '16 at 16:36
See also [Intel's nice diagram](https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx). — Peter Cordes, Jun 01 '16 at 17:55
voted to close as a duplicate. It's not *really*, but the underlying reason is the same, and the connection is obvious. Basically, this question doesn't need a separate answer, so it can be marked a duplicate. — Peter Cordes, Jun 01 '16 at 17:58
Possible duplicate of [Using AVX CPU instructions: Poor performance without "/arch:AVX"](https://stackoverflow.com/questions/7839925/using-avx-cpu-instructions-poor-performance-without-archavx) — phuclv, Sep 08 '18 at 10:45

Why does GCC prefer the AVX version of FP instructions?

0 Answers0