18

Many SSE "mov" instructions specify that they are moving floating-point values. For example:

  • MOVHLPS—Move Packed Single-Precision Floating-Point Values High to Low
  • MOVSD—Move Scalar Double-Precision Floating-Point Value
  • MOVUPD—Move Unaligned Packed Double-Precision Floating-Point Values

Why don't these instructions simply say that they move 32-bit or 64-bit values? If they're just moving bits around, why do the instructions specify that they are for floating-point values? Surely they would work whether you interpret those bits as floating-point or not?

Josh Haberman
  • 4,170
  • 1
  • 22
  • 43

2 Answers2

17

I think I've found the answer: some microarchitectures execute floating-point instructions on different execution units than integer instructions. You get better overall latency when a stream of instructions stays within the same "domain" (integer or floating point). This is covered in pretty good detail in Agner Fog's optimization manual, in the section titled "Data Bypass Delays": http://www.agner.org/optimize/microarchitecture.pdf

I found this explanation in this similar SO question: Difference between MOVDQA and MOVAPS x86 instructions?

Community
  • 1
  • 1
Josh Haberman
  • 4,170
  • 1
  • 22
  • 43
  • 5
    Just commenting to confirm that this is correct. :) There is usually a 1-2 cycle latency for tossing a value across different domains. – Mysticial Apr 30 '13 at 07:53
4

In case anyone cares, this is exactly why in Agner Fog's vectorclass he has seperate vector classes to use with boolean float (Vec4fb) and boolean integer (Vec4i) http://www.agner.org/optimize/#vectorclass

In his manual he writes. "The reason why we have defined a separate Boolean vector class for use with floating point vectors is that it enables us to produce faster code. (Many modern CPU's have separate execution units for integer vectors and floating point vectors. It is sometimes possible to do the Boolean operations in the floating point unit and thereby avoid the delay from moving data between the two units)."

Most questions about SSE and AVX can be answered by reading his manual and more importantly looking at the code in his vectorclass.

  • 1
    Thanks for the reference! Agner Fog is incredible. I have no idea how one person can turn out as much useful code, docs, and info as he does. – Josh Haberman May 02 '13 at 23:02
  • So this is also the answer, I presume, to "why is there ANDPS, ANDPD and PAND, which are used to do the conditionals (and likewise for ANDN,OR,XOR). So, why is ANDPS different from ANDPD? Different pipeline profiles in the same ex. unit, I guess... – greggo Jan 12 '16 at 19:53