1

How would you use matrix traspotion to optimize this code for caches

for (int i = 0 ; i < SIZE ; i ++) {

    for (int j = 0 ; j < SIZE ; j ++) {

        dest[i][j] = src[j][i];
    }

}

1 Answers1

0

You have to know about the machine architecture to do this properly. But basically you usually want to divide the work amongst N - 1 threads (N being the number of threads available and take away one for the main manager thread) where the blocks of memory read/write access for each thread are broken into aligned cache-line sizes so the threads don't fight on the memory bus over common-memory hits.

Paul Evans
  • 27,315
  • 3
  • 37
  • 54