I recently noticed that seemingly small changes in how a matrix is accessed in C can have a big impact on performance. For example, let's imagine we have these two fragments of C code. This one:
for(i = 0; i < 2048; i++)
{
for(j = 0; j < 2048; j++) {
Matrix[i][j] = 9999;
}
}
And this one:
for(j = 0; j < 2048; j++)
{
for(i = 0; i < 2048; i++) {
Matrix[i][j] = 9999;
}
}
The second version is 2x slower than the first version. Why? I think it has to do with memory management: in each loop, the first version accesses positions in the memory that are located next to each other, while the second version has to "jump" to different regions in each loop. Is this intuition right? Also, if I make the matrix small (64x64 for example), then there is no difference in performance. Why? I would appreciate if someone could provide an intuitive and rigorous explanation. I am using Ubuntu 14.04 LTS, by the way.