It depends. It might be faster to have a number of instructions repeated. This technique is commonly known as loop unrolling. Not unrolled loop can also turn out to be more efficient because a the code will be smaller, and many CPUs are capable and can often recognize the loop pattern and predict it. It may also be possible to have a partially unrolled loop. For example, instead of executing 20 instructions straight or doing 20 loop iterations, one can do 5 loop iterations executing 4 instructions in each one.
Generally, it is hard to tell what is the best without knowing what architecture you are targeting (i.e. make and model of the CPU). That's why people don't really write assembly code a lot — analyzing pros and cons of different approaches, cost of execution, and generating different code for different CPU makes and models is something that compiler developers do. Others then write code in their language of choice, and compiler generates the best possible assembly for a target platform, which works out in 99% of cases.
To answer your question, you would probably either write both versions yourself and profile them to see which one wins. Alternatively, you may write the code in C and turn on optimizations for your platform (i.e. use -O3
, -march
switches) and see what compiler generates — it surely does the right thing.
Hope it helps. Good Luck!