What's the difference between the assembly instructions LOOP
, LOOPE
and LOOPNE
?
-
2Don't use these instructions; they are very slow on modern CPUs. Instead use branching by hand. – alex strange Nov 23 '09 at 08:22
-
1@Alex Strange: do you have any evidence to support your statement? Thanks. – Timotei Oct 26 '10 at 11:23
-
1@Timotei Dolean: See the instruction tables in http://agner.org/optimize/. A CPU textbook that discusses microcoding (and hopefully some do) will explain the reasoning. – alex strange Oct 26 '10 at 22:25
-
@alexstrange: related: **[Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](https://stackoverflow.com/q/35742570)** has some uop counts and throughput numbers for `loop` on various recent microarchitectures, and some of the history behind how we ended up in this catch-22 situation of: nobody uses it because it's slow / not worth making faster because nobody uses it. If it was fast, it would often save code size, and be great for `adc` loops (especially on CPUs with partial-flag stalls like Nehalem and earlier.) – Peter Cordes May 28 '18 at 05:51
4 Answers
LOOP decrements ecx and checks if ecx is not zero, if that condition is met it jumps at specified label, otherwise falls through.
LOOPE decrements ecx and checks that ecx is not zero and ZF is set - if these conditions are met, it jumps at label, otherwise falls through.
LOOPNE is same as LOOPE except that it requires ZF to be not set (i.e be zero) to do the jump.

- 167,383
- 100
- 513
- 979
-
9Also not asked I'd like to point out that all LOOP instructions are much slower than the DEC ECX / JNZ counterpart. This is intended as LOOP should nowadays only be used for delay calibration loops used for hardware-drivers and the like. – Nils Pipenbrinck Nov 18 '09 at 15:23
-
1@NilsPipenbrinck: On which processors is it slower? What's your source? – Janus Troelsen May 18 '13 at 19:56
-
2@JanusTroelsen, its slower from the 80486 onwards. On the lastest processors it's **a lot** slower. Source: http://www.agner.org/optimize/ manual #2. – Johan Oct 18 '13 at 14:14
-
@sharptooth, speaking of LOOPE, how after decrementing can ECX be non zero and ZF set? Does LOOPE not affect the ZF flag? – golem Sep 04 '15 at 18:35
-
Answering my own question. After checking it in gdb I can confirm that none of the loop (LOOP, LOOPE, LOOPNE) instruction affect the ZF flag when they decrement the ECX counter. Now it makes sense. – golem Sep 04 '15 at 18:56
Time for a Google Books Reference
EDIT: Synopsis from link: LOOPE and LOOPNE are essentially LOOP instructions with one additional check. LOOPE loops "while zero flag," meaning it will loop as long as zero flag ZF is one and the increment is not reached, and LOOPNE loops "while not zero flag," meaning it continues the loop as long as ZF is zero and the increment is not reached. Keep in mind that neither of these instructions inherently affect the status of ZF.

- 25,644
- 17
- 102
- 155
-
2I believe that it is best to not only provide a link, but quote relevant material from the source, should the link ever become invalid. – Thomas Owens Nov 18 '09 at 14:22
The LOOP instructions, as well as JCXZ/JECXZ are a bit slow; however, they still have their place in modern code.
High speed is not always a concern in loops. For example, if we are executing a loop only once during program init and the iteration count is small, the time required will not be noticed.
Another example is a loop where Windows API functions are called; the time spent in the API call probably makes the LOOP execution time trivial. Again, this applies when the iteration count is small.
Consider these instructions as "another tool in your toolbox"; use the right tool for the job ;)

- 730,956
- 141
- 904
- 1,278

- 19
- 1
Have you tried looking it up in an instruction set reference, for example in this one by Intel?

- 55,348
- 14
- 97
- 151