C++ code execution time depends on code structre

Question

I came across this great youtube tutorial and in one of the slides I saw something I didn't understand. Why is this happening? Is this compiler related?

Version #1 looks like this:

const int N = 5000;
float a [N*N];

for (int x=0; x<n; ++x)
     for(int y=0; y<N; ++y)
          sum+=a[x+y*N];

and takes about 239.4ms to execute.

And version #2 looks like this:

const int N = 5000;
float a [N*N];

for (int y=0; y<n; ++y)
     for(int x=0; x<N; ++x)
          sum+=a[x+y*N];

and takes about 79.5ms to execute. Why is this happening?

@user202729 well I did not linked my question to another site, just added a refernce to the actual video for credit, meaning i'm not the owner of the content. but thanks for your note. — chendoy, Oct 20 '18 at 13:09

score 5 · Answer 1 · answered Oct 20 '18 at 12:57

5

The second example demonstrates better data locality since it accesses elements in the same row. Basically it performs sequential memory read, while first example jumps over sizeof(float) * N bytes on each iteration putting extra stress on CPU cache / memory.

answered Oct 20 '18 at 12:57

user7860670

35,849
4
58
84

Can it be that the compiler optimizes the value of the multiplication, so in the second case it does not have to reevaluate the value of y*N on every loop iteration? – kangaro0 Oct 20 '18 at 13:04
1

@kangaro0 Actually I think that multiplication can be optimized (by converting into addition) in both cases. In first case it will probably be just `+ N` and `+ 1` in second case. – user7860670 Oct 20 '18 at 13:08
Accessing contiguous memory is always faster. As @VTT said, data locality! – 7raiden7 Oct 20 '18 at 13:31

C++ code execution time depends on code structre

1 Answers1