i'm currently learning x86 assembly language and wondered what is the better way for implementing loops. One way would be to mov a value to ecx register and use the loop instruction and the other way would be using a jmp instruction and then comes the loop body and then a conditional jumping eventually to the beginning of the loop body. I guess the first one will has a better readability but other then that i don't know why to use it.
-
Never knew/know when to accept it as there always may be a better answer i guess? Is this really important? cause i really do not know. – rob Jul 24 '11 at 17:49
-
Related: [Why are loops always compiled like this?](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this): it's almost always best to use a `do{}while()` structure in asm, with a conditional branch at the bottom. If the loop might need to run 0 times, then jmp to the bottom is one strategy, but usually not the best. – Peter Cordes Feb 18 '18 at 07:00
1 Answers
When you mention jmp+body+test, I believe you are talking about the translation of a while
loop in high-level languages. There is a reason for the second approach. Let's take a look.
Consider
x = N
while (x != 0) {
BODY
x--
}
The naive way is
mov ecx, N ; store var x in ecx register
top:
cmp ecx, 0 ; test at top of loop
je bottom ; loop exit when while condition false
BODY
dec ecx
jmp top
bottom:
This has N conditional jumps and N unconditional jumps.
The second way is:
mov ecx, N
jmp bottom
top:
BODY
dec ecx
bottom:
cmp ecx, 0
jne top
Now we still do N conditional jumps but we only do ONE unconditional jump. A small savings but it just might matter, especially because it is in a loop.
Now you did mention the loop
instruction which is essentially
dec ecx
cmp ecx, 0
je somewhere
How would you work that in? Probably like this:
mov ecx, N
cmp ecx, 0 ; Must guard against N==0
je bottom
top:
BODY
loop top ; built-in dec, test, and jump if not zero
bottom:
This is a pretty little solution typical of CISC processors. Is it faster than the second way above? That depends a great deal on the architecture. I suggest you do some research on the performance of the loop
instruction in the IA-32 and Intel 64 processor architectures, if you really want to know more.

- 86,166
- 18
- 182
- 232
-
Thanks, that helped quit a bit, i'll try to do some further research about the speed of the ecx loop :) – rob Jul 24 '11 at 09:45
-
3@rob, happy researching. May I suggest http://www.agner.org/optimize/optimizing_assembly.pdf ? An amazing resource. Very long. On page 89 it is mentioned that you should avoid JECXZ and LOOP because they are not so efficient on the more modern architectures. – Ray Toal Jul 24 '11 at 16:59
-
1Related: [Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently) for some historical factors. Fun fact: AMD Bulldozer / Ryzen have fast `loop`, but nothing else does. Also related: [Why are loops always compiled like this?](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this) for efficient loop structures: as you say, conditional branch at the bottom, and various strategies if it might need to run 0 times. – Peter Cordes Feb 18 '18 at 07:01