Avoiding powers of 2 for cache friendliness

Question

Suppose in speed-critical code we have a pair of arrays that are frequently used together, where the exact size doesn't matter, it just needs to be set to something reasonable, e.g.

int a[256], b[256];

Is this potentially a pessimization because the low address bits being the same can make it harder for the cache to handle both arrays simultaneously? Would it be better to specify e.g. 300 instead of 256?

You are correct to suspect that powers-of-two could be problematic. But it usually only applies when you have more than 2 strides. (especially when you exceed your L1 cache associativity) [Here's an example of where it actually becomes problematic.](http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops) In that example, there are 4 arrays - all of which are aligned to the same offset from the start of a 4k page. — Mysticial, Aug 08 '12 at 17:27

score 8 · Accepted Answer · edited May 23 '17 at 11:48

Moving my comment to an answer:

You are correct to suspect that powers-of-two could be problematic. But it usually only applies when you have more than 2 strides. It doesn't get really bad until you exceed your L1 cache associativity. But even before that you might run into false aliasing issues.

Here are two examples where powers-of-two actually become problematic:

Why are elementwise additions much faster in separate loops than in a combined loop?
Matrix multiplication: Small difference in matrix size, large difference in timings

In the first example, there are 4 arrays - all of which are aligned to the same offset from the start of a 4k page.

In the second example, the column-wise hopping of a matrix completely destroys performance when the size is a power-of-two.

In any case, note that the key concept is actually the alignment of the arrays, not the size of them. If you find that you are running into slow-downs, just add some padding between your arrays to break the alignment.

Another useful trick: if you only ever access entries one at a time (and never access a "slice" via memcpy or the like) you can try applying a trivial hash function to the indexes of the arrays. Usually, XOR. I.e. always access a[i^0x67] and b[j^0x34]. // I just found a place where that helps. — Krazy Glew, Oct 26 '12 at 04:28

Avoiding powers of 2 for cache friendliness

1 Answers1