Lets take a small and simple "2D" array:
int a[2][2];
In memory it's laid out like this:
+---------+---------+---------+---------+
| a[0][0] | a[0][1] | a[1][0] | a[1][1] |
+---------+---------+---------+---------+
In the first variant of the loops, you iterate in the order a[0][0]
, a[0][1]
, a[1][0]
, and a[1][1]
. You iterate over the elements that are close together, so they are in the cache.
In the second variant you iterate in the order a[0][0]
, a[1][0]
, a[0][1]
, and a[1][1]
.
See how you jump back and forth in the second variant? When the arrays are larger, this will make the CPU need to fetch data from memory more often, since it's no longer possible to fit it all into the cache. More memory access leads to lower efficiency and longer execution times.