How does this change lead to different performance?

Question

I've just stumbled upon this from 2015 Fall: 15-213 Introduction to Computer Systems lecture by Carnegie Mellon University.

Is this incorrect? Because it's exactly the same code with swapped int variables. Please help. I appreciate any explanation here.

What was the topic of that lecture? Is it about cache performance? Look at the sequence in which the array elements are accessed- — Gerhardh, Feb 27 '23 at 09:34
It was introduction to computer systems lecture, this part was about memory: https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d8c83d3a-8074-4afe-ae3b-693e2250999a — unobatbayar, Feb 27 '23 at 09:37

score 4 · Accepted Answer · answered Feb 27 '23 at 09:34

Lets take a small and simple "2D" array:

int a[2][2];

In memory it's laid out like this:

+---------+---------+---------+---------+
| a[0][0] | a[0][1] | a[1][0] | a[1][1] |
+---------+---------+---------+---------+

In the first variant of the loops, you iterate in the order a[0][0], a[0][1], a[1][0], and a[1][1]. You iterate over the elements that are close together, so they are in the cache.

In the second variant you iterate in the order a[0][0], a[1][0], a[0][1], and a[1][1].

See how you jump back and forth in the second variant? When the arrays are larger, this will make the CPU need to fetch data from memory more often, since it's no longer possible to fit it all into the cache. More memory access leads to lower efficiency and longer execution times.

Weird: I would think that `&a[i][j]` equals `&a[0][0] + i*l + j` (where `l` equals the length of `a[i]`), so there should be no difference. Clearly I'm wrong. — Dominique, Feb 27 '23 at 09:45
@Dominique From the C point of view there's no difference. The difference is that the access-pattern leads to cache hits or misses in different ways in the hardware. — Some programmer dude, Feb 27 '23 at 09:54

How does this change lead to different performance?

1 Answers1