Currently using this 64-bit MASM code to call a C runtime function such as memcmp(). I recall this convention was from a GoAsm article on optimizations.
memcmp PROTO;:QWORD,:QWORD,:QWORD
PUSH RSP
PUSH QWORD PTR [RSP]
AND SPL,0F0h
MOV R8,R11
MOV RDX,R10
MOV RCX,RAX
SUB RSP,32
CALL memcmp
LEA RSP,[RSP+40]
POP RSP
Is this a valid optimized version below?
memcmp PROTO;:QWORD,:QWORD,:QWORD
PUSH RSP
PUSH QWORD PTR [RSP]
AND RSP,-16 ; new
MOV R8,R11
MOV RDX,R10
MOV RCX,RAX
LEA RSP,[RSP-32] ; new
CALL memcmp
LEA RSP,[RSP+40]
POP RSP
The justification for replacing
AND SPL,0F0h
with
AND RSP,-16
is that it avoids invoke partial register updates. Understanding fastcall stack frame
Replacing
SUB RSP,32
with
LEA RSP,[RSP-32]
is that ensuing instructions do not depend on the flags being updated by the subtraction
then not updating the flags will be more efficient as well.
Why does GCC emit "lea" instead of "sub" for subtraction?
In this case, are there other optimization tricks too?