14

i'm currently learning x86 assembly language and wondered what is the better way for implementing loops. One way would be to mov a value to ecx register and use the loop instruction and the other way would be using a jmp instruction and then comes the loop body and then a conditional jumping eventually to the beginning of the loop body. I guess the first one will has a better readability but other then that i don't know why to use it.

rob
  • 345
  • 1
  • 5
  • 13
  • Never knew/know when to accept it as there always may be a better answer i guess? Is this really important? cause i really do not know. – rob Jul 24 '11 at 17:49
  • Related: [Why are loops always compiled like this?](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this): it's almost always best to use a `do{}while()` structure in asm, with a conditional branch at the bottom. If the loop might need to run 0 times, then jmp to the bottom is one strategy, but usually not the best. – Peter Cordes Feb 18 '18 at 07:00

1 Answers1

13

When you mention jmp+body+test, I believe you are talking about the translation of a while loop in high-level languages. There is a reason for the second approach. Let's take a look.

Consider

x = N
while (x != 0) {
    BODY
    x--
}

The naive way is

    mov ecx, N      ; store var x in ecx register
top:
    cmp ecx, 0      ; test at top of loop
    je bottom       ; loop exit when while condition false
    BODY
    dec ecx
    jmp top
bottom:

This has N conditional jumps and N unconditional jumps.

The second way is:

    mov ecx, N 
    jmp bottom
top:
    BODY
    dec ecx
bottom:
    cmp ecx, 0
    jne top

Now we still do N conditional jumps but we only do ONE unconditional jump. A small savings but it just might matter, especially because it is in a loop.

Now you did mention the loop instruction which is essentially

dec ecx
cmp ecx, 0
je somewhere

How would you work that in? Probably like this:

    mov ecx, N
    cmp ecx, 0       ; Must guard against N==0
    je bottom
top:
    BODY
    loop top         ; built-in dec, test, and jump if not zero
bottom:

This is a pretty little solution typical of CISC processors. Is it faster than the second way above? That depends a great deal on the architecture. I suggest you do some research on the performance of the loop instruction in the IA-32 and Intel 64 processor architectures, if you really want to know more.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • Thanks, that helped quit a bit, i'll try to do some further research about the speed of the ecx loop :) – rob Jul 24 '11 at 09:45
  • 3
    @rob, happy researching. May I suggest http://www.agner.org/optimize/optimizing_assembly.pdf ? An amazing resource. Very long. On page 89 it is mentioned that you should avoid JECXZ and LOOP because they are not so efficient on the more modern architectures. – Ray Toal Jul 24 '11 at 16:59
  • 1
    Related: [Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently) for some historical factors. Fun fact: AMD Bulldozer / Ryzen have fast `loop`, but nothing else does. Also related: [Why are loops always compiled like this?](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this) for efficient loop structures: as you say, conditional branch at the bottom, and various strategies if it might need to run 0 times. – Peter Cordes Feb 18 '18 at 07:01