Does the CPU cache also load information from previous memory locations?

Question

If the following code is executed:

int *array = new int[1000];
for (int i = 0; i < 1000; i++)
    array[i] = i * 2;

The CPU stores the array in cache. But, if the following code is executed:

int *array = new int[1000];
for (int i = 1000-1; i >= 0; i--)
    array[i] = i * 2;

I'm wondering if the CPU can also cache the array, or if it only assumes it exists in the "forward" direction.

You should not care. Exception for performance, the CPU cache is transparent for programs. — Basile Starynkevitch, Feb 15 '16 at 12:41
Cache work with lines 32 or 64 etc... (hardware dependent) bytes. and possible with memory granularity, so first acess to any byte load full (n-bytes) memory block into cache line — oklas, Feb 15 '16 at 12:41
If you put `int array[1000];` it will be on stack which is a part of memory which is reused so many times so it is most likely in the cache. If you worry about the cache don't use the 'new', it is usually much slower than a cache miss. — Marian Spanik, Feb 15 '16 at 12:48
@MarianSpanik I am fully aware of that. I just did it as an example of memory that is likely not already cached. — user112513312, Feb 15 '16 at 13:13
@MarianSpanik just wanting to add that `new[]` has a one time penalty of the libc/your OS trying to get you mappable memory; other than that, there's nothing "special" from a CPU perspective about heap vs stack; in fact, there's no real reason to assume `new[]` allocates on the stack; it usually is that way, but a compiler might totally detect it's going to be a 1000 array of ints and put it on the heap. See @lightningracesinorbit's favourite SO things. On a machine without MMU, the notion of heap and stack might be invalid, even. — Marcus Müller, Feb 15 '16 at 18:05

Marcus Müller · Accepted Answer · 2016-02-15T12:53:08.883

There's far too many CPUs out there to make a general assumption about this, but:

If you're, let's say, on a common x86 architecture, then what the cache will contain are always multiples of a cache line size, containing the first address you accessed that led to a cache miss;that is the same for forward access.

Depending on how complicated memory access prediction is, the backward access might also be prefetched; who does that prediction depends on your CPU architecture, your actual CPU implementation, and your compiler. It's not uncommon for compilers to "know" which memory access patterns work well for a given CPU generation and make sure that memory access happens in that order.

For your very arithmetic case, there might even be e.g. automatic detection of four consecutive, aligned addresses being accessed, and automatic vectorization with the SIMD instructions your CPU support. That also has an effect on the alignment with with the RAM is accessed, which might have even further influence on cache behaviour..

Furthermore, since you seem to care about speed, you'd typically allow your compiler to optimize. In very many cases, this would lead to such loops becoming "reversed", and SIMD'ed, even.

Note that for other architectures, this might work differently: For example, there's an infamous family of motorola DSPs of the mid-90s that had a relatively simple address generation unit, and things like accessing memory backwards was possible fast if you (or your C compiler) knew how to tell it to work backwards; then, there was the option to "fuse" a memory load or store with any other CPU instruction, so here your whole caching would effectively be dominated by how you manually specified the memory access patterns.

score 1 · Answer 2 · edited May 23 '17 at 11:50

1

I'm wondering if the CPU can also cache the array, or if it only assumes it exists in the "forward" direction.

The CPU cache works in unit of cache lines (e.g. 32 words or bytes). See this. The order in which you access your array (increasing or decreasing addresses) does not matter much. The first access into a cache line would be some cache miss (both in your forward and backward scenarii), but not the next ones.

The compiler might optimize and unroll the loop, and/or emit PREFETCH machine instructions. You might perhaps use carefully (with GCC) its __builtin_prefetch (see this) but that might even slow down your code if you use it wrongly.

edited May 23 '17 at 11:50

Community

1
1

answered Feb 15 '16 at 12:44

Basile Starynkevitch

223,805
18
296
547

which, one should stress, would be the same as with forward access. – Marcus Müller Feb 15 '16 at 12:46
The CPU itself will likely issue some of its own prefetching accesses even without the compiler ones. These streams in most common CPUs can also go either way – Leeor Feb 16 '16 at 01:02

score 0 · Answer 3 · answered Feb 15 '16 at 12:43

Yes, the array will be cached. Data is taken to cache as a multiple of a cache line size. So for example if the cache line size is 8 bytes then when you access a memory location for the first time irrespective of whether you are trying to access byte 0 or byte 7 all the memory locations from 0-8 will be taken into the cache.

score 0 · Answer 4 · answered Feb 15 '16 at 12:44

0

Cache work with lines 32 or 64 etc... (hardware dependent) bytes. and possible with memory granularity, so first acess to any byte load full (n-bytes) memory block into cache line

answered Feb 15 '16 at 12:44

oklas

7,935
2
26
42

Does the CPU cache also load information from previous memory locations?

4 Answers4