I have the following code snippet (https://godbolt.org/z/cE1qE9fvv) which contains a naive & vectorized version of a dot product.
I decided to make the vectorized version compile in standalone asm file as following:
extern exit
section .text
global _start
_start:
mov rax, 8589934593
mov QWORD [rsp-72], rax
mov rax, 17179869187
mov QWORD [rsp-64], rax
mov rax, 25769803781
mov QWORD [rsp-56], rax
mov rax, 34359738375
mov QWORD [rsp-48], rax
mov rax, 85899345930
mov QWORD [rsp-40], rax
mov rax, 171798691870
mov QWORD [rsp-32], rax
mov rax, 257698037810
mov QWORD [rsp-24], rax
mov rax, 343597383750
mov QWORD [rsp-16], rax
movdqa xmm1, [rsp-72]
movdqa xmm0, [rsp-24]
pmulld xmm1, [rsp-40]
pmulld xmm0, [rsp-56]
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 8
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 4
paddd xmm0, xmm1
movd eax, xmm0
.exit:
call exit
I use the following to build: nasm -f elf64 dot_product.asm && gcc -g -no-pie -nostartfiles -o dot_product dot_product.o
The above code segfault at movdqa xmm0, XMMWORD PTR [rsp-72]
which probably means that the data is not 16-bytes aligned. However, the following screenshot seems to indicate the opposite:
Am I misunderstanding something ?