6

When compiling for CPUs that have AVX (such as with -march=sandy-bridge), GCC seems to always prefer the AVX versions of simple, scalar floating-point instructions over the SSE versions. Such as, it uses vmulsd instead of mulsd.

I'm wondering, are there any particular performance-related reasons for this, or is it just some implementation detail of GCC that makes it easier/more natural for it to schedule such instructions? From what I can tell from the sources I have (mostly Agner's instruction tables), the AVX and SSE instructions seem to be equal in performance. I realize that AVX instructions are three-operand, but GCC seems to almost always only use the same destination register as one of the source operands anyway.

Dolda2000
  • 25,216
  • 4
  • 51
  • 92
  • 3
    The VEX encoded version clears the upper bits thereby reducing dependency. – Jester Jun 01 '16 at 16:14
  • 2
    Bad things can happen if 256-bit VEX instructions get mixed with SSE ones. Turning on AVX in the compiler forces everything to VEX so you can safely mix scalar, 128-bit, and 256-bit code with no issues. – Mysticial Jun 01 '16 at 16:17
  • 1
    Also, GCC shouldn't always be using the same register for destination and source. Maybe if optimizations are disabled, or if the code doesn't need to use the same operand twice. Try something like `a1 = a0 + b0; b1 = a0 - b0;`. At least one of them needs to be done out-of-place. – Mysticial Jun 01 '16 at 16:20
  • @Mysticial: "Bad things"? – Dolda2000 Jun 01 '16 at 16:23
  • 3
    http://stackoverflow.com/questions/7839925/using-avx-cpu-instructions-poor-performance-without-archavx/7841251#7841251 – Mysticial Jun 01 '16 at 16:27
  • @Mysticial: That certainly explains it. The linked page apparently also contains the explanation for the behavior, which is an interesting read. Thanks! – Dolda2000 Jun 01 '16 at 16:36
  • See also [Intel's nice diagram](https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx). – Peter Cordes Jun 01 '16 at 17:55
  • 1
    voted to close as a duplicate. It's not *really*, but the underlying reason is the same, and the connection is obvious. Basically, this question doesn't need a separate answer, so it can be marked a duplicate. – Peter Cordes Jun 01 '16 at 17:58
  • Possible duplicate of [Using AVX CPU instructions: Poor performance without "/arch:AVX"](https://stackoverflow.com/questions/7839925/using-avx-cpu-instructions-poor-performance-without-archavx) – phuclv Sep 08 '18 at 10:45

0 Answers0