1

I have the following code snippet (https://godbolt.org/z/cE1qE9fvv) which contains a naive & vectorized version of a dot product.

I decided to make the vectorized version compile in standalone asm file as following:

    extern exit
    
    section .text
    global _start

_start:
    mov  rax, 8589934593
    mov     QWORD [rsp-72], rax
    mov  rax, 17179869187
    mov     QWORD [rsp-64], rax
    mov  rax, 25769803781
    mov     QWORD [rsp-56], rax
    mov  rax, 34359738375
    mov     QWORD [rsp-48], rax
    mov  rax, 85899345930
    mov     QWORD [rsp-40], rax
    mov  rax, 171798691870
    mov     QWORD [rsp-32], rax
    mov  rax, 257698037810
    mov     QWORD [rsp-24], rax
    mov  rax, 343597383750
    mov     QWORD [rsp-16], rax
    movdqa  xmm1, [rsp-72]
    movdqa  xmm0, [rsp-24]
    pmulld  xmm1, [rsp-40]
    pmulld  xmm0, [rsp-56]
    paddd   xmm0, xmm1
    movdqa  xmm1, xmm0
    psrldq  xmm1, 8
    paddd   xmm0, xmm1
    movdqa  xmm1, xmm0
    psrldq  xmm1, 4
    paddd   xmm0, xmm1
    movd    eax, xmm0

.exit:
    call    exit

I use the following to build: nasm -f elf64 dot_product.asm && gcc -g -no-pie -nostartfiles -o dot_product dot_product.o

The above code segfault at movdqa xmm0, XMMWORD PTR [rsp-72] which probably means that the data is not 16-bytes aligned. However, the following screenshot seems to indicate the opposite: enter image description here

Am I misunderstanding something ?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ferdinand Mom
  • 59
  • 1
  • 5
  • 1
    How did you build your local executable? You tagged this [nasm], but the only code is GCC output, with code for a function that strangely fails to do constant-propagation. On entry to a function RSP%16 == 8. (Note that `_start` isn't a function; RSP%16==0 at the top of _start. And/or doing any pushes would change RSP, if I had to guess at likely reasons for the stack having different alignment wherever you copied this asm.) Anyway yes, you're correct that in your GDB session, `rsp` is 16-byte aligned so `rsp-72` is an odd multiple of 8. Not a [mcve]; GCC's asm is correct. – Peter Cordes Jun 07 '22 at 20:19
  • 1
    Also, re: your question title: immediate values exist in the machine code. Code alignment is irrelevant here, what matters is data alignment; the destination of a `mov [mem], imm` or where you store a register. – Peter Cordes Jun 07 '22 at 20:22
  • I build the following way: nasm -f elf64 dot_product.asm && gcc -g -no-pie -nostartfiles -o dot_product dot_product.o – Ferdinand Mom Jun 07 '22 at 20:23
  • With what source? Given that you used `-nostartfiles`, I guess you probably are writing your own `_start`, so yeah, it's not a function, RSP%16 == 0 on entry to `_start` and RSP points to `argc`, not a return address. – Peter Cordes Jun 07 '22 at 20:24
  • I edited the post to comply with the rules. Sorry for the inconvenience. – Ferdinand Mom Jun 07 '22 at 20:30
  • Ok yeah, `main` is a function, `_start` isn't. RSP%16 == 8 vs. 0. One dummy `push` would align the stack the way GCC code-gen for a function is expecting. Also, if the only libc function you're using it `exit`, you might as well just `mov eax, 231` / `syscall` to `exit_group(edi)`, so you can make a simple static executable with no libc (`gcc -nostdlib -static`, or just `ld foo.o`) – Peter Cordes Jun 07 '22 at 20:31

0 Answers0