how the cache size and array size affect the performance of mathematical operations on an array?

Question

I am trying to learn the usage of cache. From what I see by doing some sample experiment program, the time taken for execution of a program iterating through an array and doing some operations on the elements suddenly increases very much if I increase the array size beyond a particular value.Can anyone explain in simple terms how the cache size and array size affect the performance of mathematical operations on an array?

It kinda depends on what sort of mathematical operations you are doing. Can you be more specific? — paddy, Sep 29 '13 at 21:47
http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code — assylias, Sep 29 '13 at 21:54
http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra — assylias, Sep 29 '13 at 22:07

Vallabh Patade · Answer 1 · 2013-09-29T22:13:27.087

If the cache is not able to accumulate the array, any reference to those non accumulated elements will result into the cache miss. The way you access the array elements is also makes the difference, because on every miss, processor brings block of data into the cache, thinking that this data might be needed soon, preparing itself to avoid future cache misses.

Example :

If you are operating on the elements from consecutive locations, performance will be improved. Because depending upon the size of cache line, processor will fetch a block of memory on first cache miss.

For example, take an instance of Matrix Multiplication, we do it in following way.

Assume : Matrices are too large to accumulate in cache.

 for (i = 0; i < N; i = i + 1)
      for (j = 0; j < N; j = j + 1)
          A[i*N + j] = (double) random() / SOME_NUMBER;     

 for (i = 0; i < N; i = i + 1)
   for (j = 0; j < N; j = j + 1)
       B[i*N + j] = (double) random() / SOME_NUMBER;


 for (i = 0; i < N; i = i + 1)
    for (j = 0; j < N; j = j + 1)
       for (k = 0; k < N; k = k + 1)
           C[i*N + j] = C[i*N + j] + A[i*N + k]*B[k*N + j];

Here while multiplying for every row, we go on accessing the second matrix column wise. In B first column is store in B[0],[N-1],B[2N-1]....... and so on, which are not consecutive memory locations. Hence there will be lot of cache misses. So if we could mold the solution so that we will deal with consecutive memory locations and can have some performance gain. Instead of storing the second matrix in 'Row Major Form', we can store it in 'Column Major Form' i.e. in transposed form. So now all the column elements are in consecutive locations.

A will be accesses row wise and B will be accessed column wise, so we have all their 'respective thing' in consecutive memory locations.

So when processor experiences the first miss, it will fetch a block of memory and hence next miss(es) will be avoided. Here we have exploited the 'Spatial Locality' and hence 'Cache Blocking'.

Now if we change the code as follows there will be significant improvement in performance

Store B in transposed form:

   B[j*N + i] = random() / SOME_NUMBER;

You'll also have to access the transposed array in that order:

  C[i*N + j] = C[i*N + j] + A[i*N + k]*B[j*N + k];

Thanks for the answer. But my question is why does the execution time increase suddenly when the cache size is reached? — Doubting, Sep 29 '13 at 22:52
Because after that any non-cached element reference is cache miss. Then processor will go to look into L2 then L3 (If there is multi level cache) or finally into main memory. Memory access is very time consuming (No. of CPU cycles) compared to access cache. — Vallabh Patade, Sep 29 '13 at 22:58
But this happens for each cache miss, right. so the performance depends on whether the data already loaded to cache is used or not. It should not depend on whether the cache is full or not.So why does cache miss depend on the cache size? — Doubting, Sep 29 '13 at 23:05
Cache miss is when element is not in cache. An element will not be in cache because of these reasons - Cache full, Element is not prefetched due to poor organization of data and cache line size, lack of temporal and spatial locality. — Vallabh Patade, Sep 29 '13 at 23:13
I have an array and some operations are done on the array. I have a single level cache with size 1MB and cache line size is 32 bytes. Then each time an array element not currently in cache is requested 32 bytes will be loaded into cache. When the cache is full also, the same thing happens, the only difference being a cache line needs to be overwritten. So the cache miss count will depend on this 32 bytes(ie. cache line size) only. I still don't get the part the cache size plays. — Doubting, Sep 30 '13 at 08:20
As long as each element is only used once, cache size isn't important, as every load will be a cache miss. But most algorithms will touch that data again at a later time. If the cache is large enough, the second (and third, and forth ...) request to an element will be served from cache (cache hit). But if the cache is too small to hold the whole array, some or even all of these secondary accesses will generate cache misses instead. — Jan Lucas, Nov 09 '16 at 07:29

how the cache size and array size affect the performance of mathematical operations on an array?

1 Answers1

Linked