In the textbook Computer Systems: a Programmer's Perspective
there are some impressive benchmarks for optimizing row-major order access.
I created a small program to test for myself if a simple change from row-major access to column-major access would make a huge difference on my own machine.
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define N 30000
int a[N][N] = { 0 };
int main() {
srand(time(NULL));
int sum = 0;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
a[i][j] = rand() % 99;
}
}
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
sum += a[i][j];
}
}
}
On average row-major order access took 8.42s
(n=5
trials) on my system whereas column-major order access took 30.12s
(n=5
trials) on my system which is pretty significant.
It seems like on the surface like it should be a pretty simple thing to optimize.
Why don't modern compilers optimize these scenarios?