Can someone post any simple explanation of cache aware algorithms? There are lot of links available but the reading material in those sites is academic in nature and time consuming to read and comprehend.
2 Answers
A cache-aware algorithm is designed to minimize the movement of memory pages in and out of the processor's on-chip memory cache. The idea is to avoid what's called "cache misses," which cause the processor to stall while it loads data from RAM into the processor cache.
A cache-aware algorithm that is less than optimum on paper can outperform a traditional algorithm that is in theory "faster," because the cache-aware algorithm uses memory more efficiently.
A cache-aware algorithm is explicitly coded to take advantage of the processor's cache behavior. Intimate details about the processor's memory page size and "cache lines" are coded into the algorithm. As such, a cache-aware algorithm will be highly processor specific.
A cache-oblivious algorithm is coded to use memory in a more cache-friendly manner than a traditional algorithm, but it does not depend on intimate details about the underlying hardware.

- 131,090
- 20
- 188
- 351
-
3Hm didn't he ask for examples?! – ljs Jan 24 '09 at 14:25
-
1No. The question said, "simple explanation." – Jim Mischel Jan 27 '09 at 16:52
-
1Here's a poor example: http://stackoverflow.com/a/11227902/845092, where sorting data before running his function made it 6 times faster. – Mooing Duck Aug 03 '12 at 17:07
I think one of the simplest examples of a cache-aware algorithm is accessing a two-dimensional array row-major vs. column-major. As a two-dimensional array is usually stored in memory just as a concatenation of all the rows of the array, accessing it row by row puts the appropriate data into cache at the right time. However, when accessing the array in column-major order, a whole lot of jumps in memory and cache misses can cause a big slowdown.
To give an example, this C++ code:
for (int i = 0; i < MAX_N; ++i) {
for (int j = 0; j < MAX_N; ++j) {
a[i][j] = 10;
}
}
runs 3-4 times faster on my machine than if I swap the indices of the accessed cell (that is, access a[j][i]
instead).

- 3,771
- 24
- 23
-
1I believe this is technically an example of a [cache-oblivious algorithm](https://en.wikipedia.org/wiki/Cache-oblivious_algorithm) as it does not explicitly know the size of the cache. – Ben Jones Jul 02 '19 at 20:26