I've always kind of liked the idea of testing for a range instead of just for equality, in case a bit flips accidentally or something. But in x86 asm, keep in mind that cmp/jge
can't macro-fuse on Core2 (in 32-bit mode), but cmp/je
can. I thought that was going to be more relevant until I checked Agner Fog's microarch pdf and found that it was only Core2, not Nehalem, that couldn't fuse that, since macro-fusion doesn't work at all in 64-bit mode on Core2. (Later microarchitectures don't have that limitation, and can macro-fuse more and more combinations.)
Depending on the counter, you can usually count down without a CMP at all (dec/jnz). And often you know it doesn't need to be 64-bit, so you can use dec esi / jnz
or whatever. dec esi / jge
does work for signed counters, but dec
doesn't set CF so you can't (usefully) use JA.
Your loop structure, with an if() break
in the middle and a jmp at the end, is not idiomatic for asm. Normal is:
mov ecx, 100
.loop: ; do{
;; stuff
dec ecx
jge .loop ; }while(--ecx >= 0)
You can use jg to only restart the loop with positive ecx, i.e. loop from 100..1 instead of 100..0.
Having a not-taken conditional branch and a taken unconditional branch in a loop is less efficient.
Expanding on discussion in question comments about saving/restoring r12: Normally you'd do something like:
my_func:
; push rbp
; mov rbp, rsp ; optional: make a stack frame
push rbx ; save the caller's value so we can use it
sub rsp, 32 ; reserve some space
imul edi, esi, 11 ; calculate something that we want to pass as an arg to foo
mov ebx, edi ; and save it in ebx
call foo
add eax, ebx ; and use value. If we don't need the value in rbx anymore, we can use the register for something else later.
... ;; calculate an array size in ecx
test ecx, ecx ; test for the special case of zero iterations *outside* the loop, instead of adding stuff inside. We can skip some of the loop setup/cleanup as well.
jz .skip_the_loop
; now use rbx as a loop counter
mov ebx, ecx
.loop:
lea edi, [rbx + rbx*4 + 10]
call bar ; bar(5*ebx+10);
; do something with the return value? In real code, you would usually want at least one more call-preserved register, but let's keep the example simple
dec ebx
jnz .loop
.skip_the_loop:
add rsp, 32 ; epilogue
pop rbx
;pop rbp ; pointless to use LEAVE; rsp had to already be pointing to the right place for POP RBX
ret
Notice how we use rbx for a couple things inside the function, but only save/restore it once.