Why iterating through an array like this is inefficient in C?

Question

I was reading a book in which there is this paragraph :

Arrays in C can be seen as a contiguous chunk of memory. More precisely, the last dimension of the array is the contiguous part. We call this the row-major order. Understanding this and the fact that a cache fault loads a complete cache line into the cache when accessing uncached data to prevent subsequent cache faults, we can see why accessing an array of dimension 10000x10000 with array[0][0] would potentially load array[0][1] in cache, but accessing array[1][0] right after would generate a second cache fault, since it is sizeof(type)*10000 bytes away from array[0][0], and therefore certainly not on the same cache line. Which is why iterating like this is inefficient:

#define ARRLEN 10000

int array[ARRLEN][ARRLEN];
size_t i, j;

for (i = 0; i < ARRLEN; ++i)
{
    for(j = 0; j < ARRLEN; ++j)
    {
        array[j][i] = 0;
    }
}

Can you explain this to me that what they are trying to explain in this paragraph and what is the "cache fault" thing they are talking about?

They're referring to [CPU cache](https://en.wikipedia.org/wiki/CPU_cache) — Arkadiusz Drabczyk, Jan 03 '21 at 13:35
I think that the main thing this paragraph is trying to say is that when you iterate an array like that, you move 10000 numbers each iteration, instead of 1 each time if you use `array[i][j]`. If you use `array[i][j]` it will store a chunk of the array to make future iterations faster, but when you move 10000 numbers each time it cant store enough without wasting lots of resources. — Roy Avidan, Jan 03 '21 at 13:36
When you access `array[0][0]` the next elements (`array[0][0…n]`) are probably pre-cached with the assumption that they will likely be accessed next. However, if the access jumps to `array[1][0]` instead, that assumption was false and the cached `array[0][1]` and consecutive elements are unnecessary and discarded in favour of `array[1][0…n]`. — Arkku, Jan 03 '21 at 13:36
It's not just bad for caching. Modern CPUs also do automatic prefetching, so contiguous accesses are important even beyond the size of an individual cacheline. Unit stride is also more likely to be vectorizable without requiring scatter/gather instructions, which might not be available or slow. — EOF, Jan 03 '21 at 13:37
Does this answer your question? [What is a cache hit and a cache miss? Why would context-switching cause cache miss?](https://stackoverflow.com/questions/18559342/what-is-a-cache-hit-and-a-cache-miss-why-would-context-switching-cause-cache-mi) The problem is caused by the use of the unusual term "cache fault", which should have been "cache miss" instead. The author probably confused it with "page fault", which - unfortunately - is also some kind of cache miss. — Ulrich Eckhardt, Jan 03 '21 at 13:43

score 12 · Accepted Answer · answered Jan 03 '21 at 13:36

12

Think of the array as pages in a book. If each page holds 1024 characters, then the array declared as a[100][1024] is like a 100 page book. It is more efficient to read the book by reading each page. That is, you iterate in the order a[0][0], a[0][1], ..., a[0][1023], a[1][0]. ie, you read a full page and then turn the page. If you iterate over the left most index it is like reading one character from each page, turning the page after you read a single character, then going back to page 1 when you get to the end of the book to read the 2nd character. Turning the page is a cache fault.

answered Jan 03 '21 at 13:36

William Pursell

204,365
48
270
300

2

This is a great anaology. – ojblass Jan 03 '21 at 13:45
can you tell me what they are saying by writing this? ===> Understanding this and the fact that a cache fault loads a complete cache line into the cache when accessing uncached data to prevent subsequent cache faults. – Neeraj-Kumar-Coder Jan 03 '21 at 13:49
1

The cache fault is the kernel realizing that is doesn't have the book open to the right page, so it brings that page into the cache to be read. – William Pursell Jan 03 '21 at 14:07
@Neeraj-Kumar-Coder It's common for a CPU cache line to be something like 64 bytes. Accessing a **single** `char` or `int` from memory means reading an entire 64-byte block of memory into cache and possibly writing it back to memory. Doing that repeatedly can really hurt performance. Cache-friendly code can make a huge difference. – Blastfurnace Jan 03 '21 at 14:12
Re “… the kernel realizing…”: Cache operations are performed by the hardware, not the operating system kernel. – Eric Postpischil Jan 03 '21 at 14:22
Maybe it's stretching the analogy, but I suppose a page fault would be turning the page, and a cache fault would be your eyes moving to the next line on the page. – William Pursell Jan 03 '21 at 14:24

Why iterating through an array like this is inefficient in C?

1 Answers1