I'm porting an algorithm of mine to assembly for ml64, half for sport, half to see how much performance I can actually gain.
Anyways, currently I'm trying to understand the stack frame setup, in this example as far as I know:
push rbp ; inherited, base pointer of caller, pushed on stack for storage
mov rbp, rsp ; inherited, base pointer of the callee, moved to rbp for use as base pointer
sub rsp, 32 ; intel guide says each frame must reserve 32 bytes for the storage of the
; 4 arguments usually passed through registers
and spl, -16 ; 16 byte alignment?
mov rsp, rbp ; put your base pointer back in the callee register
pop rbp ; restore callers base pointer
The 2 things that I'm not getting is
How does subtracting 32 from RSP do anything at all? As far as I know, other than for its duties going from one stack frame to another, its just another register, right? I suspect its for going into another stack frame rather than for use in the current one.
What is SPL and why does masking it make something 16 byte aligned?