Cache performance degradation due to physical layout of data

Question

Each memory address "maps" to their own cache set in the CPU cache(s), based on a modulo operation of the address.

Is there a way in which accessing two identically-sized arrays, like so:

int* array1;  //How does the alignment affect the possibility of cache collisions?
int* array2;

for(int i=0; i<array1.size(); i++){
    x = array1[i] * array2[i];   //Can these ever not be loaded in cache at same time?
}

can cause a performance decrease because the element at array1[i] and array2[i] give the same cache line modulo result? Or, would this actually be a performance increase because only one cache line would have to be loaded to obtain two data locations?

Would somebody be able to give an example of the above showing performance changes due to cache mappings, including how the alignment of the arrays would affect this?

(The reason for my question is that I am trying to understand when a performance problem occurs due to data alignment/address mappings to the same cache line, which causes one of the pieces of data to not be stored in the cache)

NB: I may have mixed up the terms cache "line" and "set"- please feel free to correct.

Examples: http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops and http://stackoverflow.com/questions/12264970/why-is-my-program-slow-when-looping-over-exactly-8192-elements — Mysticial, May 18 '14 at 05:17
Do you think that there exist two addressesa which cannot be simultaneously cached? This is not true. — n. m. could be an AI, May 18 '14 at 05:22
@n.m. Technically, for a *direct-mapped* cache, just two addresses can produce a conflict miss. Direct-mapped caches are now rare even in embedded systems, but historically direct-mapping was frequently used. — , May 18 '14 at 06:17
if array1 and array2 can fit in the same cache line, then yes there should be a performance increase, providing your logic access their elements straightaway (rather than using them as pointers and load something else created on the heap which would therefore pollute your cache) — swang, May 18 '14 at 20:58

score 0 · Accepted Answer · answered May 19 '14 at 23:13

Right now your code doesn't make much sense as you didn't allocate any memory for the arrays. The pointers are just 2 uninitialized vars sitting on the stack and pointing at nothing. Also, a pointer to int* doesn't really have size() function.

Assuming you fix all that, if you do allocate, you can decide whether to allocate the data contiguously or not. You could allocate 2*N integers for one pointer, and have the other point to the middle of that region.

The main consideration here is this - if the arrays are small enough as to not wrap around your desired cache level, having them mapped contiguously will avoid having to share the same cache sets between them. This may improve performance since simultaneous accesses to the same sets are often non-optimal due to HW considerations.

The thrashing consideration (will the two arrays throw each others' lines out of the cache) is not a problem really as most caches today enjoy some level of associativity - it means that the arrays can map to the same sets but live in different cache ways. If the arrays are too big and exceed the total number of ways together, then it means their address range wraps around the cache set mapping several times, in which case it doesn't really matter how it's aligned, you're still going to collide with some lines of the other array

for e.g., if you had 4 sets and 2 ways in the cache, and try mapping 2 arrays of 64 ints with an alignment offset, you'd still fill out your entire cache -

          way0        way1     
set 0   array1[0]   array2[32]
set 1   array1[16]  array2[48]
set 2   array1[32]  array2[0]
set 3   array1[48]  array2[16]

but as mentioned above - accesses within the same iteration would go to different sets, which may have some benefit.

Cache performance degradation due to physical layout of data

1 Answers1