2

I understand how temporal/spatial locality affect design decisions when coding and I also understand when alignment affects cache performance. However, could somebody please demonstrate an example of some C++ where the cache associativity is taken into account to make a piece of code faster?

Lets say x86, Intel CPU where the L1 cache is 8-way set associative, the L2 is 8-way set associative and the L3 is 16-way set associative.

(My overall aim of this question is to understand how set associativity affects performance when writing code and "programming to the hardware" to gain performance when you know your target architecture)

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
user997112
  • 29,025
  • 43
  • 182
  • 361
  • 2
    http://stackoverflow.com/questions/12264970/why-is-my-program-slow-when-looping-over-exactly-8192-elements, http://stackoverflow.com/questions/6060985/why-huge-performance-hit-2048x2048-array-versus-2047x2047, http://stackoverflow.com/questions/11413855/why-is-transposing-a-matrix-of-512x512-much-slower-than-transposing-a-matrix-of – Mysticial Jun 19 '14 at 00:55
  • @Mysticial just reading Agner Fog's C++ manual and he says: "Variables whose distance in memory is a multiple of the critical stride will contend for the same cache lines." Shouldn't that last part read "cache sets", not "cache lines"? I presume he is talking about cache lines replacing another cache line within the same "set"? – user997112 Jun 19 '14 at 01:41

0 Answers0