If you untangle the macro definitions above, you'll eventually result in this, which is what the processor is actually trying to execute:
mov rax, [rsi]
mul qword [rdx] <--- no segfault here
mov r8, rax
mov rax, [rsi]
mul qword [rdx+8] <--- segfault here
mov r9, rax
mov rax, [rsi+8]
mul qword [rdx] <--- segfault here
add r8, rax
mov rax, r8
mov rdx, r9
You said that the segmentation faults occur on the marked lines above.
So let's do a little psychic debugging (one of Raymond Chen's favorite kinds of debugging). Consider what happens after the first mul
instruction: rax
is set to the low part of the product, and rdx
is set to the high part of the product.
That means that after the first mul
, rdx
has changed! It's no longer a pointer to y1
or to y2
, but is instead something having to do with whatever x1*y1 results in. Any successive attempts after this to use rdx
as a pointer are guaranteed to fail, because it isn't one anymore.
So in order to fix this, you'll have to preserve rdx
across the multiplication in another register. r10
isn't used by this code, and it's considered volatile, so we can safely use it to store a "backup" of the initial value of rdx
. So something like this is likely sufficient to fix it:
%define a1 [rdi]
%define a2 [rdi+8]
%define x1 [rsi]
%define x2 [rsi+8]
%define y1 [r10] ; Change which register we're using as the
%define y2 [r10+8] ; pointer to 'r10'.
%define output1 rax
%define output2 rdx
%define res1 r8
%define res2 r9
global abc
abc:
mov r10, rdx ; Preserve the real pointer to 'y1' in 'r10'.
mov output1, x1
mul qword y1
mov res1, output1
mov output1, x1
mul qword y2
mov res2, output1
mov output1, x2
mul qword y1
add res2, output1
mov output1, res1
mov output2, res2
ret