3

I have the following piece of code:

std::vector<double> a(1000000), b(1000000);  
// fill a with random doubles
for (int i = 0; i < b.size(); ++i) {  
    b[i] = a[i]*a[i] + a[i]; 
} 
std::cout << b[0] << std::endl; // make sure the compiler doesn't optimize the loop away

The generated assembly for the for loop (g++ -O3):

movupd  xmm1, XMMWORD PTR[rbp + 0 + rax]
movapd  xmm0, xmm1
mulpd   xmm0, xmm1
addpd   xmm0, xmm1
movups  XMMWORD PTR[r12 + rax], xmm0
add     rax, 16
cmp     rax, 8000000
jne.L17

I can comprehend the assembly up to the following line:

movups  XMMWORD PTR[r12 + rax], xmm0

Here the result of a[i] * a[i] + a[i]is written back to memory (in a vectorized fashion). But why is the movups (single precision) instruction invoked rather than movupd (double precision)?

Urwald
  • 343
  • 1
  • 10
  • 3
    Hmm. As far as I can tell, the *only* difference between `MOVUDP` and `MOVUPS` is that the former has a `0x66` prefix byte. Maybe the compiler has spotted a code alignment trick by using the non-prefixed instruction once and the prefixed one once? – Adrian Mole Sep 13 '22 at 11:17
  • 2
    ... so maybe the question should be why does it use `MOVUPD` in the first instance? – Adrian Mole Sep 13 '22 at 11:19
  • 1
    The compiler should prefer `movups` if possible as the encoding is shorter. `movups` and `movupd` do the exact same thing. – fuz Sep 13 '22 at 12:35
  • Thank you guys for your replies. @fuz Could you please elaborate a little bit further? What do you by "encoding is shorter"? – Urwald Sep 13 '22 at 12:38
  • 2
    @Urwald `movupd xmm, xmm/m` is `66 0F 10 /r` whereas `movups xmm, xmm/m` is just `0F 10 /r`. I.e. the `66` prefix byte is absent, saving one byte of memory over `movupd`. – fuz Sep 13 '22 at 13:04
  • 1
    @Urwald Instructions are stored in memory/file as sequences of bytes (called opcodes). Your `movupd` takes 6 bytes, while `moveups` only 5. Live demo: https://shell-storm.org/online/Online-Assembler-and-Disassembler/?inst=movupd++xmm1%2C+XMMWORD+PTR%5Brbp+%2B+0+%2B+rax%5D%0D%0Amovups++XMMWORD+PTR%5Br12+%2B+rax%5D%2C+xmm0&arch=x86-64&as_format=inline#assembly. – Daniel Langr Sep 13 '22 at 13:17
  • 1
    [VEX prefixes encoding and SSE/AVX MOVUP(D/S) instructions](https://stackoverflow.com/q/51773399) shows disassembly for movups vs. movupd. – Peter Cordes Sep 13 '22 at 16:17

0 Answers0