Example of cache friendly code

Question

template<size_t size>
class Objects{
  std::array<int,size> a;
  std::array<int,size> b;
  std::array<int,size> c;

  void update(){
    for (size_t i = 0; i < size; ++i){
      c[i] = a[i] + b[i];
    }
  }
};

I am gathering information of how to write cache friendly code since a week and I read though several articles but I still haven't understood the basics.

Code like I have written above is used in most of the examples, but for me this is not cache friendly at all.

For me the memory layout should look like this

aaaabbbbcccc

and in the first loop it will access

[a]aaa[b]bbb[c]ccc

If I understood it correctly, the cpu prefetches elements that are near in memory. I am not sure how intelligent this method is but I assume it's primitive and it just fetches the nth nearest elements.

The problem is that [a]aaa[b]bbb[c]ccc will not access the elements in order at all. So it might fetch the next '3' elements a[aaa]bbbbcccc which is nice for the next a because it will be a cache hit but not for the b.

Is the example above cache friendly code?

The cache holds several "areas" of RAM simultaneously. You should study the concept of "cache lines". — molbdnilo, Aug 15 '14 at 19:49
@molbdnilo Thanks that is probably the missing piece of information that I need. — Maik Klein, Aug 15 '14 at 19:53

score 2 · Accepted Answer · answered Aug 15 '14 at 20:03

I suggest you use an array of structures:

struct Cache_Item
{
  int a;
  int b;
  int c;
};

Cache_Item cache_line[size];

for (unsigned int i = 0; i < size; ++i)
{
  cache_line[i].c = cache_line[i].a + cache_line[i].b;
}

The structure arrangement allows for all the variables in use to be next to each other in the cache line or very close.

In your array method, element b[0] ideally is at location a[size], so they are size items apart. This could mean that they are on different cache lines. The result location, c[0], is at a[size + size], which means it could be 2 cache lines away.

score 1 · Answer 2 · answered Oct 17 '14 at 17:39

Your code is not particularly unfriendly. It requires three active cache lines at a time instead of one, but that isn't too much to ask. Your code would be a lot more cache-unfriendly if instead of

std::array<int,size> a;

you had

std::array<struct { int x; char description[5000]; }, size> a;

because then the CPU would have to pick out the lone x among the thousands of bytes of description (which your loop never uses).

Your example would also be more cache-unfriendly if you had not just a, b, and c, but also d-z and aa-az and maybe a few more. (How far you have to go depends on the sophistication of your cache - how many way-associative it is, etc.)

Have you profiled yours vs Thomas Matthews' code?

score 0 · Answer 3 · edited May 23 '17 at 10:26

0

You should trust the compiler optimization work (and of course enable optimizations); it probably deals quite well with the CPU cache (perhaps by issuing appropriate prefetch instructions).

Sometimes you can hint the compiler thru builtins or pragmas. For example with GCC on x86-64 you might -with care- use the __builtin_prefetch. Usually it is not worth the effort (and if you misuse it performance will suffer). See this answer to a related question.

edited May 23 '17 at 10:26

Community

1
1

answered Aug 15 '14 at 20:12

Basile Starynkevitch

223,805
18
296
547

The compiler is not free to rearrange struct layouts - it's bound by the platform's ABI. While "trust the compiler" is a good meta-answer, this isn't a meta-question. – Bernd Jendrissek Oct 17 '14 at 16:28
The compiler could rearrange layout of struct if it can prove that the struct data does not escape its compilation unit. GCC used to have such an optimization pass. – Basile Starynkevitch Oct 17 '14 at 17:22

Example of cache friendly code

3 Answers3