Assessing the cost of nested 'for' iterations

Question

Let assume I have to execute a code which consist five nested for loops. Let call them:

A - 10 iterations (elements)
B - 15 iterations (elements)
C - 16 iterations (elements)
D - 20 iterations (elements)
E - 100 iterations (elements)

Is there any difference between when I loop them in this order:

A(B(C(D(E()))))

and

E(D(C(B(A()))))?

Or maybe different order of the loops is optimal?

My question is language independent. I'd like to know how to approach to assess the cost of this code, to write the most optimum (fast) one.

Are there any difference in the calling (iterating) cost, depending on the order of size of the loop, or not?

Where to start looking to solve and get know more about this kind of problem(s)?

I already know the answer on 2D example posted here: http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra but what about dimensions 3+? — user1146081, Mar 21 '15 at 19:17

score 1 · Answer 1 · answered Mar 21 '15 at 19:23

Yes, there is difference. Consider selecting the loop order which makes the memory accesses cache-friendly. If you have a multidimensional array that you access in the loops, it should be accessed in the order that consecutive accesses access neighboring memory locations.

However, a full answer to your question is not possible as it depends on what you do inside the loop. If it's not a memory access of a multidimensional array, then this previous answer doesn't apply.

I suggest the approach of benchmarking. Every time you need to have nested for loops, benchmark which order gives the best performance. It's simple, really, although for 5 loops you have 5! = 120 possible orders. However, I think 5 nested loops is not a typical use case and in more typical cases such as 3 or 4 loops the approach of benchmarking is feasible.

score 0 · Answer 2 · answered Mar 21 '15 at 19:46

Given that there are nearly 20 billion passes through the inner loop I doubt juhist's comments about cache-friendly array access are relevant--it's very unlikely you have a 5D, 20 billion element array involved. It's possible there are lesser arrays involved where cache efficiency could still help you, though.

The big thing I would be looking for is how to prune portions of this task. Not only loops that don't have to be run but values that can be calculated in some outer loop rather than repeatedly recalculated at a deeper nesting level. Look for even parts of expressions to pull out. Be very leery of any reference to a loop variable at a higher level than the code doing the reference.

If none of these optimizations are possible I would put them in the order you have them--while the order has no effect on the number of times the inner loop runs it will have a small effect on the number of times the loops themselves run--swapping A and E means something like another 2 billion loop setups.

Assessing the cost of nested 'for' iterations

2 Answers2