0

Which version is more efficient and why? It seems that both make the same computations. The only thing I can think of is if the compiler recognizes that in (a) j does not change value and doesn't have to compute it over and over again. Any input would be great!

#define M /* some mildly large number */
double a[M*M], x[M], c[M];
int i, j;

(a) First version
for (j = 0; j < M; j++)
    for (i = 0; i < M; i++)
        c[j] += a[i+j*M]*x[i];

(b) Second version
for (i = 0; i < M; i++)
    for (j = 0; j < M; j++)
        c[j] += a[i+j*M]*x[i];
Samu
  • 101
  • 1
  • 7

1 Answers1

5

This is about memory-access patterns rather than computational efficiency. In general (a) is faster because it accesses memory with unit stride, which is much more cache-efficient than (b), which has a stride of M. In the case of (a) each cache line is fully utilised, whereas with (b) it is possible that only one array element will be used from each cache line before it is evicted,

Having said that, some compilers can perform loop reordering optimisations, so in practice you may not see any difference if that happens. As always, you should benchmark/profile your code, rather than just guessing.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 1
    I had never head of unit stride. I am reading about it now on wikipedia. Thanks for your answer :) – Samu Jan 06 '17 at 11:53
  • "Unit stride" effectively just means "sequentially" or "contiguously" in this context. – Paul R Jan 06 '17 at 11:54
  • 2
    @Samu: Literally "one step at a time". It's like picking up items in order as you walk down a supermarket aisle, rather than getting something from shelf 1, then walking down to get something from shelf 10, then walking back to shelf 2, then walking to shelf 11... In this analogy, your computer actually picked up everything from shelves 1-10 to begin with on the assumption that you could then cherry-pick whatever you wanted without doing any walking at all! And now it has to pick up everything from shelves 1-10, then everything from shelves 11-20, then everything from shelves 1-10 again... – Lightness Races in Orbit Jan 06 '17 at 11:55