The performance depends on the contents of the loops.
Let's decompose the for
loop. A for
loop is comprised of:
- Initialization
- Comparison
- Incrementing
- Content (statements)
- Branching
Let us define a comparison as a compare instruction (to set the processor status bits) and a branch (to take advantage of the processor status bits).
Processors are at their happiest when they are executing data instructions. The processor manipulates the data, then processes the next instruction in the pipeline (cache).
The processors don't like sections 2) Comparison and 5) Branching (to the top of the loop). Branching means that the processor has stop processing data and execute logic to determine if the instruction cache needs to be replaced or not. This time could be spent processing data instructions.
The goal to optimizing a for
loop is to reduce the branching. The secondary one is to optimize the data cache / memory accesses. A common optimization technique is loop unrolling, or basically placing more statements inside the for
loop. As a measurement, you can take the overhead of the for
loop and divide by the quantity of statements inside the loop.
According to the above information, your first loop (with both assignment statements) would be more efficient, since there are more data instructions per loop; less overhead overall.
Edit 1: The Parallel Environment
However, your second example may be faster. The compiler could set up both loops to run in parallel (either through instructions or actual parallel tasks). Since both loops are independent, they can be run at the same time or split between CPU cores. Processors have instructions that can perform common operations on multiple memory locations. Your first example, makes this a little more difficult because it requires more analyzation from the compiler. Since the loops on the second example are simpler, the compiler's analyzation is also simpler.
Also, the quantity of iterations also plays a factor. For small quantities, the loops should perform the same or have negligible differences. For large quantities of iterations, there may be some timing differences.
In summary: PROFILE. BENCHMARK. The only true answer depends on measurements. They may vary depending on the applications being run at the same time, the amount of memory (both RAM and hard drive), the quantity of CPU cores and other items. Profile and Benchmark on your system. Repeat on other systems.