0

I sometimes use this pattern to iterate array of something:

    mov [rsp+.r12], r12 ; Choose a register that calls inside the loop won't modify
    mov r12, -1
.i:
    inc r12
    cmp r12, [rbp-.array_size]
    je .end_i
    ; ... program logic ...
    jmp .i
.end_i:
    mov r12, [rsp+.r12]

I understand that it is enough to test for equality but should not one "securely" test for "greater or equal"(prevent situation that will not occur).

Should one use je or jge in this cases?

I am asking about concrete tip that can reduce likelihood of introducing bugs.

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
Bulat M.
  • 680
  • 9
  • 25
  • I've always kind of liked the idea of testing for a range instead of just for equality, in case a bit flips accidentally or something. But in x86 asm, keep in mind that `cmp/jge` can't macro-fuse on Core2 (in 32-bit mode), but `cmp/je` can. I thought that was going to be more relevant until I checked and found that it was only Core2, not Nehalem, that couldn't fuse that, since macro-fusion doesn't work at all in 64-bit mode on Core2. (Later microarchitectures don't have that limitation, and can macro-fuse more and more combinations.) – Peter Cordes Sep 18 '16 at 04:49
  • Why are you showing some weird spill/reload of r12 to free it up for use as a temporary counter, though? That's totally irrelevant (and doesn't look like efficient code). Surely there's some register that's already dead that you can use without saving. – Peter Cordes Sep 18 '16 at 04:52
  • @Peter, how one should write it properly? I think that r12 is good choice, because function calls inside the loop like printf would not modify call-saved r12 register and we do not need to manually save and restore counter around calls. Please, correct, if it is wrong. – Bulat M. Sep 18 '16 at 05:07
  • Oh is *that* what your meant with that code comment? Sure, r12 is a good choice, but ebx or ebp might be an even better choice if you don't need a 64-bit counter (no REX prefix). The main thing is that having a named spot in your stack frame called `.r12` is weird. Normally if you're going to name something, you use a label with semantic meaning. So if `r12` was previously holding `some_value`, you'd store that. And after the loop, maybe use r12 for something else. – Peter Cordes Sep 18 '16 at 05:25
  • You should save/restore registers with push/pop when you need to do that, because the code-size is smaller, but preferably not just around some small part of your function. e.g. save/restore r12 on function entry/exit, and use it for a couple different things at different points inside your function. You shouldn't save/restore two registers if you don't need two call-preserved registers at the same time; just reuse the one you're finished with. – Peter Cordes Sep 18 '16 at 05:27
  • You mean code like push ecx, push ecx ; call some_function ; pop ecx pop ecx ; (two times to keep stack alignment at 16B)? Cumbersome a little bit. Could you please elaborate your answer by including these details? Small succinct comments harder to understand as I am not very skill yet. – Bulat M. Sep 18 '16 at 06:34
  • 1
    No, that's the opposite of what I was talking about. updated my answer, thanks for letting me know you were getting lost with the short comments so I could expand on it instead of just wasting both our time :) – Peter Cordes Sep 18 '16 at 06:58

1 Answers1

3

I've always kind of liked the idea of testing for a range instead of just for equality, in case a bit flips accidentally or something. But in x86 asm, keep in mind that cmp/jge can't macro-fuse on Core2 (in 32-bit mode), but cmp/je can. I thought that was going to be more relevant until I checked Agner Fog's microarch pdf and found that it was only Core2, not Nehalem, that couldn't fuse that, since macro-fusion doesn't work at all in 64-bit mode on Core2. (Later microarchitectures don't have that limitation, and can macro-fuse more and more combinations.)

Depending on the counter, you can usually count down without a CMP at all (dec/jnz). And often you know it doesn't need to be 64-bit, so you can use dec esi / jnz or whatever. dec esi / jge does work for signed counters, but dec doesn't set CF so you can't (usefully) use JA.

Your loop structure, with an if() break in the middle and a jmp at the end, is not idiomatic for asm. Normal is:

mov ecx, 100

.loop:             ; do{
    ;; stuff
    dec ecx
    jge .loop      ; }while(--ecx >= 0)

You can use jg to only restart the loop with positive ecx, i.e. loop from 100..1 instead of 100..0.

Having a not-taken conditional branch and a taken unconditional branch in a loop is less efficient.


Expanding on discussion in question comments about saving/restoring r12: Normally you'd do something like:

my_func:
    ; push rbp
    ; mov  rbp, rsp      ; optional: make a stack frame

    push   rbx           ; save the caller's value so we can use it
    sub    rsp, 32       ; reserve some space

    imul   edi, esi, 11   ; calculate something that we want to pass as an arg to foo
    mov    ebx, edi       ; and save it in ebx
    call   foo
    add    eax, ebx       ; and use value.  If we don't need the value in rbx anymore, we can use the register for something else later.

    ...  ;; calculate an array size in ecx

    test   ecx, ecx                ; test for the special case of zero iterations *outside* the loop, instead of adding stuff inside.  We can skip some of the loop setup/cleanup as well.
    jz    .skip_the_loop

    ; now use rbx as a loop counter
    mov    ebx, ecx
.loop:
    lea    edi, [rbx + rbx*4 + 10]
    call   bar                     ; bar(5*ebx+10);
    ; do something with the return value?  In real code, you would usually want at least one more call-preserved register, but let's keep the example simple
    dec    ebx
    jnz    .loop
.skip_the_loop:

    add   rsp, 32         ; epilogue
    pop   rbx

    ;pop  rbp             ; pointless to use LEAVE; rsp had to already be pointing to the right place for POP RBX
    ret

Notice how we use rbx for a couple things inside the function, but only save/restore it once.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    Pretty much my thoughts. The test usually comes at the end of the loop to save the extra JMP, and I avoid signed comparisons both because of macro fusion and for safety. A common mistake is to forget to test for negative inputs. Unsigned comparisons don't have this problem. – icecreamsword Sep 18 '16 at 05:12
  • 1
    I would also add that I write my loops in a variety different styles depending on what is most convenient; from 0 to N-1 (INC/CMP/JB or INC/CMP/JBE); from N to 1 (DEC/JNZ); from N-1 to -1 (INC/JNC); and variations on the above with pointers rather than indices. – icecreamsword Sep 18 '16 at 05:14
  • 1
    @icecreamsword: It's funny how programming in asm usually leads to choosing the signedness and width of your function args a lot more carefully than in C. In C, some people just make everything array-related `size_t`. – Peter Cordes Sep 18 '16 at 05:15
  • @icecreamsword: yeah, looping to an end-pointer with CMP is really common, since indexed addressing modes [don't micro-fuse on Intel SnB-family CPUs](http://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes/31027695#31027695). It's cheap to generate a one-past-the-end pointer from a start + length, using a single LEA. Thus the loop uses ADD 16 / CMP/JB. – Peter Cordes Sep 18 '16 at 05:31
  • @icecreamsword: re: your last comment: INC doesn't set CF, so your last N-1 to -1 example doesn't work. Perhaps you were thinking ADD? – Peter Cordes Sep 18 '16 at 06:54
  • Correct. I originally wrote ADD and SUB but redacted to INC and DEC because it was clearer. Since the Pentium 4, I generally have avoided INC/DEC. – icecreamsword Sep 18 '16 at 07:03
  • @Peter, now I understand how to save/restore call-preserverregister at enter/exit and use it for multiple purpose in one routine. Could you please also show in answer how to use space in stack also? I mean, do one need to do `sub rsp, N` before push'ing call-preserved registers or after that, and also how to properly construct frame(`mov rbp, rsp`) without frame pointer omission, how to combine such aspects in one routine? – Bulat M. Sep 18 '16 at 07:13
  • 1
    @BulatM.: look at some compiler output with `-O3 -fno-omit-frame-pointer`. Write a function that calls an extern function so the compiler has to save/restore something. And maybe have it pass a pointer to a local variable, so the compiler has to reserve stack space. – Peter Cordes Sep 18 '16 at 07:24
  • @Peter, interestingly enough, yet on -O2 level it does not even call that extern function duplicating its body in the main, dropping it if declare function static. On -O1 in extern function it pushes rbx, rax(uses place, where rax was pushed as a stack location simultaneously aligning stack at 16B, very tricky). One question, why compiler generates `addq $8, $rsp popq rbx` instead of `popq rax popq rbx`, adding instruction faster than pop? – Bulat M. Sep 18 '16 at 08:30
  • 1
    @BulatM.: no, don't define the extern function, just write a prototype. It can't inline if the compiler doesn't have the definition... add vs. pop: it's a tradeoff, but gcc made that decision a while ago when the tradeoff was different (before CPUs had stack engines to make push/pop cheap). clang *does* tend to just pop a scratch reg to adjust by 8. – Peter Cordes Sep 18 '16 at 08:33
  • @Peter, in idiomatical loop that you showed: suppose, one uses loop to iterate over array allocated on the heap and one time that array is empty. Idiomatical loop will execute at least once possibly reading unmapped memory region and causing segmentation fault, program crash. Trying to prevent it I used to write loops in the form shown in question. How in such a loop properly(idiomatic way) skip the loop if no iterations are needed? – Bulat M. Sep 18 '16 at 10:54
  • 1
    @BulatM.: if a loop might need to run 0 times, put a check *before* the loop, outside it. – Peter Cordes Sep 18 '16 at 15:52